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Preface 


This book on probability theory and mathematical statistics is designed for 
a three-quarter course meeting four hours per week or a two-semester course 
meeting three hours per week. It is designed primarily for advanced seniors 
and beginning graduate students in mathematics, but it-can also be used by 
students in physics and engineering with strong mathematical backgrounds. 
Let me emphasize that this is a mathematics text and not a “cookbook.” 
It should not be used as a text for service courses. 

The mathematics prerequisites for this book are modest. It is assumed that 
the reader has Най basic courses in set theory and linear algebra and a solid 
course in advanced calculus. No prior knowledge of probability: and/or 
statistics is assumed. 

My aim is to provide a solid and well-balanced introduction to probability 
theory and mathematical statistics. It is assumed that students who wish to 
do graduate work in probability theory and mathematical statistics will be 
taking, concurrently with this course, a measure-theoretic course in analysis 
if they have not already had one. These students can go on to take advanced- 
level courses in probability theory or mathematical statistics after completing 
this course. 

This book consists of essentially three parts, although no such formal 
divisions are designated in the text. The first part consists of Chapters 1 
through 6, which form the core of the probability portion of the course. The 
second part, Chapters 7 through 11, covers the foundations of statistical 
inference. The third part consists of the remaining three chapters on special 
topics. For course sequences that separate probability and mathematical 
Statistics, the first part of the book can be used for a course in probability 
theory, followed by a course in mathematical statistics based on the second 
Part and, possibly, one or more chapters on special topics. j 

The reader will find here a wealth of material. Although the topics covered 
are fairly conventional, the discussions and special topics included are not. 
Many presentations give far more depth than is usually the case їп a book at 
this level. Some special features of the book are the following: 
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10. 
11. 


PREFACE 


A well-referenced chapter on the preliminaries. 

About 550 problems, over 350 worked-out examples, about 20C 
remarks, and about 150 references. 

An advance warning to the reader wherever the details become too 
involved. He can skip the later portion of the the section in question 
on first reading without destroying his continuity in any way. 

Many results on characterizations of distributions (Chapter 5). 
Proof of the central limit theorem by the method of operators and 


‘proof of the strong law of large numbers (Chapter 6). 


A section on minimal sufficient statistics (Chapter 8). 

A chapter on special tests (Chapter 10). 

A careful presentation of the theory of confidence intervals, including 
Bayesian intervals and shortest-length confidence intervals (Chapter 
11). 

A chapter on the general linear hypothesis, which carries linear 
models through to their use in basic analysis of variance (Chapter 12). 
Sections on nonparametric estimation and robustness (Chapter 13). 
Two sections on sequential estimation (Chapter 14). 


The contents of this book were used in a one-year (two-semester) course 
that I taught three times at the Catholic University of America and once in 
a three-quarter course at Bowling Green State University. In the fall of 1973 
my colleague, Professor Eugene Lukacs, taught the first quarter of this same 

: course on the basis of my notes, which eventually became this book. I have 
always been able to cover this book (with few omissions) in a one-year course, 
lecturing three hours a week. An hour-long problem session every week is 
conducted by a senior graduate student. 

In a book of this size there are bound to be some misprints, errors, and 
ambiguities of presentation. I shall be grateful to any, reader who brings 
these to my attention. 


У. К. ROHATGI 


Bowling Green, Ohio 
February 1975 
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Enumeration of Theorems 
and References 


This book is divided into 14 chapters, numbered | through 14, and an un- 
numbered chapter on the preliminaries. Each chapter is divided into several 
sections. Lemmas, theorems, equations, definitions, remarks, figures, and 
so on are numbered consecutively within each section. Thus Theorem i. j.k 
refers to the Ath theorem in Section j of Chapter i, Section i.j refers to the jth 
section of Chapter i, and so on. Theorem j refers to the jth theorem of the 
section in which it appears. A similar convention is used for equations except 
that equation numbers are enclosed in parentheses. Each section is followed 
by a set of problems for which the same numbering system is used. 
References to the chapter on preliminaries. begin with the letter Р. This 
chapter consists of three sections: P.1, P.2, and P.3. Theorems and equations 
are numbered within each section as explained above. Thus Theroem P.2.14 
refers to Theorem 14 of Section 2 of the chapter on preliminaries. 
. References are given at the end of the book and are denoted in the text by 
numbers enclosed in square brackets, [ ]. If a citation is to a book, the 
notation ([ї], j) refers to the jth page of the reference numbered [i]. 
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Introduction 


\ 


In this chapter we state some basic results from set theory, advanced 
calculus, and linear algebra that we will have occasion to refer to quite 
frequently in the rest of the book. The following discussion is not meant to 
be self-contained and is provided mainly to establish the notation used in 
subsequent chapters. 


Р.1 SETS AND CLASSES 


In this book, whenever the word ‘set is used it is assumed to designate a 
subset of a given set Q unless otherwise specified. As is usual the symbols 
U and П are used for set-theoretic union and intersection. 4^ lenotes the 
complement of set A (in Q). A — B is the difference set А N BY of iwo sets, 
A and B. А c B means that A is a subset of B, and A = В means A SB 
and B © A. The empty or null set is denoted by ф. For the union 4. Rof 
disjoint sets А, B we write А + B. x 

In general, capital letters (A, B, C, etc.) denote sets, and lower case letto 
(x, y, z, etc.) represent points or elements of sets. Classes of sets are usually 
denoted by German or English script letters (U, B, 5, etc.). We deal only 
with nonempty classes of sets.’ 

We write NA = n 4-y A for the intersection and UY = U Aca A for the 
union of a class of sets 9. A disjoint class 9 of sets is a class such that Very 
two distinct sets of 9( are disjoint. The following identities arè known so 
DeMorgan's Laws : 


(1) С U w= | A, 
(2) NW = U 4. 
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Let {A,} be a sequence of sets. The set of all points w € О that belong to 
A, for infinitely many values of nis known as the limit superior of the 
sequence and is denoted by 


lim sup 4, ог lim A, 
n-o п-со 


The set of all points that belong to A, for all but a finite number of values 
of n is known as the limit inferior of the sequence {A,}-and is denoted by 
lim inf A, or ‘lim А„ 


n-00 Po 


If 


lim A, = lim А„ 
2-99 по 
we say that the limit exists and write lim, A, for the common set and 
call it the limit set. 
We have 


(34/7 lim 4, = U N 4 € (| U A4 = lim 4,. 

у n=l k=n n=% 
If the sequence {A,} is such that A, € 4,41 for m= 1, 2, «++, it is called 
nondecreasing; if A, > A41, n = 1, 2, =, it is called nonincreasing. If the 
sequence А„ is nondecreasing, we write A, 4 ; if A, is nonincreasing, we 
write A, t. Clearly, if А, { or A, #, the limit exists and we have 


@) jim Ay A, © if Ay 
` п n=l 
and 
(5). limA; e| 4, > if Ay + 
at») n=l 


A nonempty class of subsets of Q which is closed under the formation of 
countable unions and complement and contains ¢ is known as а o-field 
(or a g-algebra). ‘ 

It is easily verified that the ehss of all subsets of Q is a o-field and that, if 
J is\a o-field, it is closed mder the formation of countable intersections 
and contains the set @. М‹асоуег, given a nonempty class of sets 9, there 
exists а unique o-field го which contains $ such that, if $^ is any other 
a-field containing 9, then ¥ 2 Fo. In other words, Yo is the smallest g- 
field containing 9; о is known as the o-field generated by Ў. у 

A q-field of gre wirterest in the study of probability is the Borel o-field 
of subsets of the real line, 2. It is the o-field generated by the class of all 
bounded semniosed intervals of the form (a, b] and will be denoted by 8. 
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The sets of 8 are called Borel sets. The following results are of key import- 
ance. 


g 
Theorem 1 (Halmos [45], 62). Every countable set of real numbers is a 
Borel set. x л 


Theorem 2 (Halmos [45], 63). B coincides with the c-field generated by 
the class of all open intervals or all intervals on the real line. 


It follows that every Borel set of real numbers can be obtained by a 
countablenumber of operations of unions, differerices, and intersections per- 
formed on intervals. 

Let A and B be two sets (not necessarily subsets ofthe same set 0). We 
write Ax В for the Cartesian product which is the set of all ordered pairs 
(а, b), where a € A and be B. The best-known example of a Cartesian pro- 
duct is the n-dimensional Euclidean space 2, which is the Cartesian product of 
п sets, each equal to 4. The o-field of subsets of @, generated by rectangles 
of the form 


(Gn xs s Xn): ai < x; < ba і= 1, 2, +, n} б 


is known as the Borel c-field on @„ and will be denoted by 9,. This c-field 
is also generated by product sets of the form By x B; x :.. X B,, where B, € 8, 
EST, 2 Yin f 

Let 2 be a nonempty set, апа 1еї 9( be a class of subsets of 0. A function 
P that has domain 9 and range contained in 2 is known as a real-valued 
set function. н 


Р.2 CALCULUS 


In this section we state some results from calculus. The order in which these . 
results appear below is not necessarily the order in which these topics are 
covered in a conventional course of lectures on calculus. 

The reader who is unfamiliar with the theory of Lebesgue integration 
may assume that the integral f4 f(x) dx, whenever it appears, is defined in 
the sense of Riemann. To ensure that this is the case, he may assume that f 
is defined and continuous at all but a finite number of points and that A is 
either an interval or the union of a finite number of (disjoint) intervals. 


1.. The Mean-Value Theorem of Integral Calculus 
Theorem 1 (Courant and John [16]; 143). If f and. g are.continuous in 
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[a; b] and g is positive іп [а; b], then there exists an x in [a; b] such that 


(д 80) dx = fla) f б) а 


In particular, there exists an x, in [a, b] such that 


fire а= Фа) Лоо). 


2. L'Hospital's Rule 
Theorem 2- (Courant and John [16], 464). Let f апа к have continuous 
first derivatives f“, and g', respectively, and let f(a) = g(a) = 0. If 
іт, f (x)/g'(x) exists, so does limy- f(x)/g69) and 
lim £2) = dim 42 
re EG) е EQ) 


If f'(a) = g'(a) = 0 and, f" and g" exist and are continuous, the same 
result can be applied. This procedure can be continued. (Here f" denotes 
the second derivative.) 


3. Convexity and Extrema 
Convexity. A real-valued function f is said to be convex if inequality 


a) f (392) < 5 лоо 162] 


holds for all values of x, and xz. If —f is convex, we call f concave. If the 
inequality in (1) is strict for x, # x», we say that f is strictly convex. We shall 
be interested only in continuous convex functions. If the second derivative 
exists, the criterion for convexity reduces to f"(x) > 0. І 

A function f defined on an interval I has support at xo € I if there exists a 
function A(x), defined by h(x), = f(x) + m(x — xo), m real such that. h(x) € 
f(x) for every x € I. The graph of л is called a line of support of f at xy. 


Extrema. Let f be defined on D с 2. We say that f has a relative maximum 
at x € D if there exists an open interval I;, containing xy such that f(x») > 
f(x) whenever x e D n Is. If f(x) = f(x) for all x e D, we say that f has 
a maximum at хо. Similar definitions are given for relative minimum and 
minimum. The term relative extremum is used to cover both terms, relative 
minimum and relative maximum, and the term absolute maximum (minimum) 
is used for the maximum (minimum) of f in D. 

Calculus does not provide any direct method to locate the extrema ofa 
function. It only permits us to locate relative extremum points. 
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Theorem 3... (Courant and John [16], 240). If f has a relative extremum at 
point хо in the interior of D, and if f'(x) exists, then (xp) = 0. 


Theorem 4 ‘(Courant and John [16], 242). "Let fbe continuous: оп an 
interval D. Let хо be an interior point of D where either f'(xy) does not 
exist or f'(xo) = 0. 
(a) If there exists an interval (a, b) with x) e (a, Б) © D such that 
У) >0 forxe(a x), and  f'()«0 for xe (x, b), 


then f has а relative maximum at хо. In the case f'(x) < 0 for 
x € (a, xo) and f'(x) > 0 for x € (ху, b), f has a relative minimum at 
хо. 287 2 


If there exists an interval (а, b) © D with x € (a, b) such that 


f'(x)-0 forxe(a x) and  xe(xo b), 


(b 


= 


«or | 
ГО) < 0 for x€ (а, х0) U (x, b), 


then f does not have a relative extremum at xy. 


Theorem 5 (Courant and John [16]; 243). Let f be defined on D. Let 
‘f'(&) be defined for x e (a; b) € D, and f'(xy) = 0 for xy e (a, b). 


(a) If f" (xo) < 0, then f has a relative maximum at хо. 
(b) If f"(xo) > 0, then f has a relative minimum at »,. 


4. Quadratic Forms 
A real-valued function Q defined on 2, by 


О(х) = Q(x, х, +. Xn) = z E ахуху, x = (Xp Xz 5 À 22350.25 


where a;; are real numbers, is iud as a quadratic form i inn variables, We 
say that Q is symmetric if a;; = аз, for all i and j, positive semidefinite if 

Q(x) = 0 for x 4 0, and positive definite if Q(x) > 0 for x # 0. Similar 
definitions are given for negative semidefinite and negative definite quadratic 
forms. Let 4 = det [a,,], and let 4, , denote the determinant with n—k 
rows and columns, obtained by deleting the last k rows and columns of 4. 
Take 4, = 1. 


Theorem 6 (Apostol [3], 151-152). А necessary and sufficient condition 
for a symmetric quadratic form Q to be сг деше: is that ie ntl 
numbers Ду, 21, +++, 4, be positive. 
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‚ In the n = 2 case, the form Q(x, у) = Ах? +2 Bxy + Cy? is known asa 
binary quadratic form, and the number 4 = AC — В?, is the discriminant of 
Q. О is positive definite if and only if A > 0 and 4 > 0, and it is positive 
semidefinite if and only if we change the sign > 0 to > 0 (not both >). 
See also Section P. 3 below. 


5. Orders of Magnitude—The “0” and “о” Notation 

Let fand g be two functions, and assume that g(x) > 0 for sufficiently 
large x. We say that f(x) is at most of the order of g(x) as x— oo, and we write 
fix) = Olg(x)] as x oo if there exist ап xp and а constant c > 0. such that 
|(х)|< cg(x) for all x = xo. Thus f(x) = O[g(x)] means that |] / &(х) 
is bounded for large:x. We write f(x) = О(1) to express the fact that f is 
bounded for large x. It is easily checked that 


(a) if ft = Olg], fx) = Olx), then fi(x) + fo) = Olei) + 
&2(Х)]; 

(b) if. а > 0 is constant, f(x) = O[a g(x)] implies that f(x) = O[g(x)]; 

(с) if f(x) = OlgG)). апа. fox) = OlgjG)) | then Ai) fr) = 
О[в1(х)в;(х)]. ; 

Let f. and g be both defined and positive for large x. We say that f(x) is 
of smaller order than g(x) as хоо, and we write f(x) = o[g(x)] as x oo if 
lim, f(x)/g(x) = 0. We write f(x) = o(1) to express the fact that f(x) +0 
- as x— oo. It is easily checked that 
(а) Ло) = e[G9] > Se), = Ole] s 
(b) A(x) = 0[21()), (Хх) = Ag) = Л(х) (х) = ole) 82). 


The symbols О and о are also used if x tends to a finite value. 
6. Taylor's Expansion 
Theorem 7 (Courant and John [16], 451-452). Let m be a nonnegative 
integer, and let f+! (x) denote the (m + 1)st derivative of f. If 
f **?(x)'is continuous in |x — al < h, then 
n -at 
© fe) = fa * E CE уо @ + Re 
ps! l 
where the remainder А, which depends on n, a, and Л, is given by 


(3) Ras p f (х= 10)" rr dt. 


The Lagrange form of the remainder is given by 
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4) R, уе) GE 97 


(13-1)! for some unspecified x e (a, x) ; 


and the Cauchy form, by 


(5 R= ftti СОР (ха) . for some unspecified xi e (a, x). 


7. Power Series 
A. series of the type 
(6) Cob CX + Cox? on yx” n 


is known as a power series. 


Theorem 8 (Courant ай@ John [16], 541). If the power series (6) converges 
for x = xy # 0, it co an absolutely for x in the interval (— |xo|, |xol), 
and uniformly in oe sed interval [—7, 7], where 0 < у < |x|. 


Theorem 9 (Courant T John [16], 542-543). Let f(x) = Уо с,х' be a 
power series that converges for x = xo # 0. Then the power series may be 
integrated and differentiated term by term any number of times in any closed 
interval [—7, 7], where 0 < 7 < |xo|. The resulting power series is uniformly 
and absolutely convergent in [—7, 7]. Moreover; the kth integral or the jth 
derivative of the power series equals, respectively, the kth integral or jth 
derivative of f in [—7, 7]. 


Theorem 10 (Courant and John [16], 544-545). Тһе representation of a 
function f by a power series, say f(x)= Уу-у cx" with a nonzero radius of 
convergence is unique and is given by 


со k 
(7) Ло) = BS OF 
where /‹0(0) = f(0). 


The expansion (7) is sometimes referred to as a Maclaurin expansion. An 
important expansion is that of the function f(x) = (1 + x)" into a power 
series. Let us write 2 

«\ _ a(a—1)---(a—v +1) 
® item ae еза 
where у is an integer (> 0) and а is an arbitrary real number. Set (8) = 1; 
2) are known as binomial coefficients. For || < 1 we have 
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«_ (a 
9) (+= (©), 
which is known as a binomial series. If œ is a positive integer, 
(e) a(a—1) + (e—v 4 1) & a! 
vj 1-2---y (a —v)!v»t 


is the ordinary binomial coefficient. Here (7) = 0 if a < у, and (9) reduces to 
LEES » a 
(10) (+x = (4) е, 


For the numerator on the right-hand side of (8) we write ,P,. It is easily 
verified that 


a a Y [а+1 
(0, (9+6) = ( > фм 
for any real number c and positive integer у. АЁ à 


(12) C3- (ny (cud е" 


8. Stirling’s Approximation 


Theorem 11 (Courant and John [16], 504). Asn- со, 


(13) n! тетт) ті 51; 
more precisely, 
(14) Qa) n" en <n! < Grmi 4 a) PS 


9. Infinite Products 


Let P, = Пер If P, — P (20), we say that the infinite product Пу; Py 
converges to the limit P. If P, does not converge to a finite nonzero limit, 
we say that the product Пу, p, is divergent. If P, > 0, we say that П p, 
diverges to 0. If each of a finite number of factors has the value 0, the 
product is convergent if it converges when these factors áre removed. In 
such cases the product has the value 0. à 


Theorem 12. (Courant and John [16], 561-562). A necessary and sufficient 
condition for II^, рь, p, > 0, to converge is that Lj, log p, converges. 
Theorem 13 (Apostol [3], 381-382). Я 


(а) Let each р, > 0. Then the product Пг, (1+ p,) converges if. and 
only if 277^ , p, converges. ; 
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(b) Let each p, > 0. Then the product Mz (1 — р,) converges г апа 
only if the series 277" , p, converges: 


10. Monotonic Functions 


Let f be defined on D € 2. We say that f is nondecreasing (nonincreasing) 
if x «y-f(x)s/f(Q)CG(x) = f(y) for x, y e D. Also f is said to be (strictly) 
increasing if x < у= f(x) < f(y). Similarly, f is decreasing if'x < y > 
f(x) > f(y). We say that f is monotonic on D if f is either increasing or de- 
creasing. If f is increasing (decreasing), we write f t ( 1 ). 


Theorem 14 (Apostol [3], 78). If f is nondecreasing on 2, then f(xy +) 
= lim? f(% h) and. f(x) —) = Нш) f(x ~ h) both exist: for each 
хоє2 and are finite. The limits at infinity Иш, f(t) =f(—)-and 
Шю, f(t) = f(+ оо) exist; the former may be —oo and the latter may 
be +оо. 


At each point xy € Ф, 
F(xo=) S Лоо) < Ло) 

and f is continuous at ху if and only if f(x; —) and /(хо+) are equal. We 
say that f has a jump at хо if and only if f(x; —) and f(xọ+) are unequal. 
It follows that for a nondecreasing function f the only possible kind' of 
discontinuity is a jump discontinuity. If there is a jump at xo, we say that 
Xo is a jump point of f and call f(xo--) — f(xo—) the size of the jump or 
simply the jump of f at хо. 

Let f be a continuous monotonic function on 4. We shall be interested 
in the inverse function. Let f(—0o0) and (+оо) be the limits at infinity 
(which may be infinite), and let 


а = min(f(—o), f(--o)), B max (f(— oo), f(+ оо)}. 
Theorem 15 (Courant and John [16], 207). Let f be differentiable at every 
x € 4? and suppose that either f’(x) > 0 for all x оѓ/'(х) «0 for all x. Then 


(a) fis increasing if f^ > 0 and decreasing if f" < 0, ' 
(b) fhas an inverse function that is continuous and monotone in (a, 8), 
(с). the inverse function x = f (y) is differentiable and satisfies 


dx. dy ү. 
dy. dx 4 


that is, 


(15) = 2d ) - HE (ә, - 
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11. The Change of Variable(s) 


Theorem 16 (Apostol [3], 216). Let g have a continuous derivative on an 
interval S with end points c and d, and let f be continuous on a(S). Let 
а = g(c) and b = g(d), and define F by the equation 


FG) = [ /(0й, xe в(5)1. 


Then, for each xe S, (ч 20012200): exists and has the value F [g(x)]. 
In particular, А 


(16 fro e = freon eo a 


If g^ never vanishes on S and g’> 0 on 5 = [c,d], say, then a « b and 
X — g (S) = [a, b]. Thus (16) becomes f 
[fore =f, леде ar. 
z #19) 
If g' < 0 on S, then X = g(S) = [b, a] and (16) becomes 


fred = =F леда. 


In either case 


[ооа f. rtm ео | ae 


Result (17) can be easily extended to cover the case in which g is not 
always monotonic but its domain can be divided into a finite number of 
subintervals in each of which g is monotonic. 

It is in form (17) that we generalize.the result of change of variables to 
multiple integrals. For ease in notation we state the result in the two-variable 
case. 


Theorem 17 (Apostol[3],271). Let . 
и = h(x, y), v= hx, у), 
together with the inverse functions 


xg», у= gu, у), 


define а one-to-one mapping of open sets A* c 2, onto B* c ,. Let 
Ac A*, and B = h(4) € B*, where h = (hi, h;). Let the first-order partial 
derivatives of gj, gz be continuous, and the Jacobian 


TFor definitions of image of set S under function g, and inverse image of set X under g, 
see P. 2.16 below. 
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|2n 95. 
и. Жш» CES 
u, У) = , 5 = (81, &; 
s дф де Mis 
Qu ov 


be never zero on B*. Let f be a real-valued function defined and continuous 
on А. Then 


8) ff pe») de dy = | firmis eeu, | ш, | du dv 
A 


ЕТА) 


Theorem 17 сап also be extended to the case in which A* can be written 
а$ а disjoint union of a finite number of sets, Aj, Ag, +++, Ay, say, such that 
the transformation from each A, to B* is one to one onto. 


12. Interchange of Limit Operations 
Let 
gx) + g(X) t + go) E 


be an infinite series of functions which converges uniformly to g(x). If every 
g(x), n = 1, 2, +++, is defined on [a, b] and continuous at x, e [a, b], then 
g(x) is continuous at xy. 


Theorem 18 (Courant and John [16], 537-538). 


(a) Let {g,(x)} be a sequence of integrable functions on [a, b] which 
converges uniformly to a function g(x). Then g is integrable, and 


lim {, £3) dx = f gG) dr. 
(b) Let › 
gy (x): gx) + ig) e 


be an infinite series of functions which converges uniformly to g(x). 
If every g,(x), п = 1, 2, ---, is defined and integrable on [a, b], then 
g is integrable on [a, b]; D2, J? £,(x) dx converges and 


2 g(x) dx = p f. g,(x) dx. 


Theorem 19- (Courant and John [16], 539)... If, on differentiating a conver- 
gent infinite series 2), g,(x) = g(x) of differentiable functions on [a, Б] 
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term by term, we obtain a uniformly convergent series of continuous terms 
Xia 8» (х), then Zr- g,(x) converges uniformly and 


eB в) = EE e 


13. Interchange of Limit Operations in Infinite Integrals 


Integrals over infinite intervals are usually referred to as infinite integrals, 
and integrals with unbounded integrands are known as improper integ- 
rals. If f is integrable on every finite closed subinterval of [a, со) and 
lim, f£ f(x) dx exists, we say that the integral |; f(x) dx converges. 
Similarly, if f is integrable in every finite closed subinterval of ( — со, a] and 
lim,- f; f(x) dx exists, we say that | f(x) dx converges. If both (7 f(x) dx 
and f^, f(x) dx exist, then f^... f(x) dx + (7 f(x) dx is denoted by (^... f(x) dx. 
It should be noted that lim, „ (^, f(x) dx may exist but (^, f(x) dx may not 
converge. If (^... f(x) dx converges, then 


r f(x) dx = limf f(x) dx. 


If f is bounded and integrable in [a, b — e] for every ғ > 0 and the limit 
їп: f^-* f(x) dx exists, we write 


f Ло) dx = lim af f(x) dx, 


and call | f(x) dx an improper b 
Similarly, we may consider integrals of the type J? ^ f(x t) dx ог 
f£ fA f(x, t) dx dt, where any or all of the numbers a, b; c, d may be infinite. 


Theorem 20 (Miller [82], 157). Let f(x, t) be continuous for t in an inter- 
val I and for all x > a. If (7 f(x, t) dx converges uniformly for te J to the 
function (7), then ¢ is continuous on I. 


Theorem 21 (Miller [82], 157). Let f(x, t) and 2/01 f(x, t) be continuous 
for t in an interval J and for all x > a. Let (7 f(x, t) dx and 
fe 9/01 f(x, 0) dx converge uniformly for гє J. Then 


4 fe.) ax = [ло 
for all te 7. 


Theorem 22 (Miller [82], 161). Let f(x, t) be continuous for all x > a and 
all > c. Let (7 f(x, t) dx converge uniformly for t in every closed, bounded 
subinterval of [c, оо), and (7 f(x, 0) dt converge uniformly for x in every 
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closed, bounded subinterval of [a, со). Let fe fi fœ y) dy dx converge 
uniformly for £z c. Then f? J? f(x, 0 dt dx and fe [ Fes 0) dx dt exist 
and are equal. i 


Theorem 23 (Apostol [3], 451). Let f, bea sequence of functions defined 
on [a, oo] such that f(x) > 0 for x > a and n = 1,2, —- If N 


f E o d= È [noe 
holds for every b > a, then 
(19) f E sme = E | лю 
provided that either side of (19) is convergent. 
Theorem 24 (Apostol [3], 373-374). If either of the series У, OF.) 
Gi рза БЕГ absolutely convergent, then so is the other and their 


sums are the same. 


14. The Laplace Transform 


If the integral fọ e f(s) ds converges for some value of t, we say that the 
function defined by i 


(20) М = К е" f(s) ds 


is the unilateral Laplace transform of f. In probability theory we are inter- 
ested in the integral 


(21) M(t) = fa е7" f(s) ds, 


which, if it converges for some value of t, is known as the bilateral Laplace 
transform of f. Since, however, 


ferma елда | AND ds 


the study of a bilateral transform is reduced to that of the sum of two 
unilateral transforms in one of which the variable f has been replaced by —t. 


Theorem 25 (Widder [137], 442)... If the integral in (20) converges at t = fo, 
it converges for 1 >t. - 
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Theorem 26 (Widder [137], 446-447). Let M(t) be the Laplace transform 


of f for t > tj. Then we can differentiate the integral in (20) under the sign 
of integration any number of times, that is, 


me () = oreo up) шс) EC 
We may therefore expand M(t) in a Maclaurin series, 
-$ uo, i 
M(t) = EM (0) Er 
in the interval of convergence. 


Theorem 27 (Widder [137], 460). The Laplace transform M of a con- 
tinuous function f is unique. 


15. Absolute Continuity 


“А real-valued function f defined оп [a, b] is said to be absolutely continuous 
on [a, b] if, for every ғ > 0, there exists a д > 0 such that 


Ё 1700 =) | <e 
-for every finite collection (0, x4)} of disjoint subintervals with 
Ё | < д. 


It is clear that an absolutely continuous function is continuous. Some 
other properties of absolutely continuous functions are given in Theorems 
28 and 29. 


Theorem 28 (Royden [107], 90). If. fis absolutely continuous, then f'(x) 
exists almost everywhere. 


Theorem 29 (Royden [107], 91). A function F is an indefinite integral if 
and only if it is absolutely continous. Thus every absolutely continuous 
function is the indefinite integral of its derivative: 


16. Borel-Measurable Functions 


Let f be a function from a set 0 into a set R. Let Ас Q. By the image 
of A under f we mean the set of elements y of R such that y = flw) for 
some w € A. We denote this image by f(A). If B © R, the inverse image of 
B under f is the set of those we 0 for which f(w) e B. We write 
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f KB) = (oe 0: f(v)e В}. 
It can be shown that f^! has the following properties : 


(22) гв = US e» 

(23) FA вд = (£^ (ву), 
дел дєл 

(24) ДВ“) = (SBW. 


Let 2 = 2, and B, be the Borel o-field on 2}. A function f: 4, — Am 
(1 € m < n) is said to be Borel’ measurable if f° (Bj,)€%, for every m- 
dimensional Borel set В, € B,,. The case m = п = 1 is of particular import- 
ance. A real-valued function f of areal variable x is Borel measurable if 
the set (x: —co < f(x) < у} 15 a Borel set for every real number у. 


P.3 LINEAR ALGEBRA 


In the following, x = (ху, х2, `t» хь) will denote a row vector of real 
numbers, and set of all vectors x will be denoted by V,(= 2,). x’ denotes 
the column vector (xy, ху, --*, Xn), 0, is the zero vector (0, 0, «+, 0) in V, and 
1, is the unit vector (1, 1, --:, 1) in V,. The scalar (or inner) product of 
x and y in V, is the scalar У”, x; у; and we write ху = Dj.) х,у, 
Clearly xy’ = yx’. Two nonzero vectors x, y in V, are orthogonal if and 
only if xy’ =0. 

For a set of s vectors (a4, 92, сс, @} in V,, the vector space V. spanned 
by {a}, a, æ} is the set of all vectors that are linear combinations of 
the a,’ s. V is also known as a linear ‘subspace of V, The set of vectors 
(ал, ал, +++, an} is said to be linearly dependent if there exists a set of scalars 
{a а, +=, а,) not all zero and such that aa, + azaz + « + apan = 0,. 
If no such scalars exist, we say that the set is linearly independent. 

Let V be a vector space. A set of linearly independent vectors that span 
V is known as a basis for V, and the dimension of V is the number of vectors 
in any basis for V. Every vector space has a basis, and any two bases for a 
vector space contain the same number of vectors (Scheffé [111], 378-379). , 

If V, is an r-dimensional vector space of n-tuples contained in V,, we write 
V, © V, Let (а, ал, 5", Gp} be a basis for V, and x e V The coefficient 
a; (i = 1, 2, --:, n) of a, in the unique linear representation x = 257., 4; a 
of x in terms of vectors of the basis is called the ith coordinate of x with 
respect to the basis (a, Gp, >>, Gn}. A basis (ay, ав, гч» Br} for V, c V, is 
said to be orthonormal if the r vectors a; are pairwise orthogonal and have 
unit norm (or length), namely, | a; | = (аа) = 1. 
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Theorem 1 (Scheffé [111], 381). 


(a) If the vectors a, a», =, a, are pairwise orthogonal (and nonzero), 
they are linearly independent. 
(b) Any linearly independent set of r vectors in V, c V, isa basis for V, 


It follows that any set of г (nonzero) orthogonal vectors in V, c V,is.a 
basis for V,. 


Theorem 2 (Scheffé [111], 382). Let (ai, а2, ---, a,} be an arbitrary basis 
for V,. Then therevexists an orthonormal. basis {ro 72 yr} for V, such 
that each 7, is a linear combination of the а. { 


Theorem 3 .(Scheffé [111], 382). If {ол, @, --+, a) is ап orthonormal basis 
for V, c И„ it can always be extended to an orthonormal basis .{aj, a»; ++, 
а,» Artis 77 On} for V,. 


Capital letters will denote matrices. Let A""" be an m x n (m rows, n 
columns) matrix of real numbers аі 1, 2,++,m; j= 1, 2, +, n. We 
write A = (a;;) and suppress the Superscript т х п. The transpose of A will be 
denoted by A’. The zero matrix will be denoted by 0, and the n x и identity 
matrix by І,. We recall that the matrix product of an m x n matrix A — (а;;) 
by an пху matrix B=(b,) is an nxr matrix C = (c4), Where cj, = 
Za ау bj, and we write C = AB. We note that the matrix product is 
not commutative. Я 

A square matrix of order п (that is, an n x n matrix) A is said to be 

singular if its determinant |A] = 0; otherwise ‘we call it nonsingular. If there 
exists a matrix B such that AB = BA = 1. we call B the inverse of A and 
write B = A Tt is well-known (Scheffé [111], 393) that a square matrix A 
has an inverse if and only if A is nonsingular. Then А“! is unique. If A is 
nonsingular, (A^)! = (АЛУ; and if A and B are both nonsingular, so is 
АВ, and (AB) ! = B| A^. 
. Let A” = (as; da, T а»). Then the column vectors а; may be consid- 
ered to be vectors in V,. The rank of A is the maximum number of linearly 
independent vectors in {@, a, ---, @,,}, that is, it is the dimension of the 
vector space spanned by the columns of A. It is well —known (Scheffé [111], 
394) that rank A""" = maximum number of linearly independent rows « 
min (m, п) and rank AB < min (rank A, rank B). Moreover, if A is m x In 
and P^"" and Q”*" are nonsingular, rank PAQ = rank A. If rank А”"" = 
min (т, п), we say that A has full rank. ; | 

Апл x n matrix P is said to be orthogonal if PP’ = 1„; the transforma- 
tion x = yP' is then known as an orthogonal transformation: | 
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Theorem 4 (Scheffé [111], 397). Let Ат" be a symmetric matrix: Then 
there exists an orthogonal matrix P”*” such that P'AP is a diagonal matrix; 
that is, the off-diagonal elements of P’AP are all 0. р 


Let A be n х n. Then |A — AL|, a polynomial in А of degree n, is known 
as the characteristic polynomial of A. The roots of [А — AL, | are termed the 
characteristic п ts of A. We say that A"'" is positive definite if all the 
characteristic roots of A are positive, and positive semidefinite if all the 
characteristic roots ё re nonnegative. 

Regardless of what orthogonal P is used to diagonalize A in Theorem 4, 
the elements {2,} of the diagonal matrix P'AP are always the same except 
for order. These {2,} are the roots of the characteristic polynomial. MS 


Theorem 5 (Scheffé [111], 399). For any matrix A, the matrix AA’ is sym- 
metric and positive semidefinite and has the same rank as A. . 
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7.14 INTRODUCTION 


The theory of probability had its origin in gambling and games of chance. 

It owes. much to the curiosity of gamblers who pestered their friends in the 

mathematical world with all sorts of questions. Unfortunately this associa- 

tion with gambling contributed to a very slow and sporadic growth of 
* probability theory as a mathematical discipline. The mathematicians of the 
day took little or no interest in the development of any theory but looked 
only at the combinatorial reasoning involved in each problem. 
The first attempt at some mathematical rigor is credited to Laplace. In 
his monumental work, Theorie analytique des probabilités (1812), Laplace 
. gave the classical definition of the probability of an event that can occur 
only in a finite number of ways as the proportion of the number of favorable 
outcomes to the total number of all possible outcomes, provided that all 
the.outcomes are equally likely. According to this definition, the computa- 
tion of the probability of events was reduced to combinatorial counting 
problems. Even in those days, this definition was found inadequate. In 
addition to being circular and restrictive, it did not answer the question of 
what probability i is, it only gave a practical method of computing the prob- 
abilities of some simple events,- 

An extension of the classical definition of Laplace was used to. evaluate 
the probabilities of sets of events with infinite outcomes. The notion of 
equal likelihood of certain events played a key role in this development. 
According to this extension, if Q is some region with a well-defined measure 
(length, area, volume, etc.), the probability that a point chosen at random 
lies in a subregion A of Q is the ratio measure (А) / measure (0). Many prob- . 
lems of geometric probability were solved using this extension. The trouble 
is that one can define “аї random” in any way one pleases, and different 
definitions therefore lead to different answers. Joseph Bertrand, for example, 
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in his book Calcul des probabilités (Paris, 1889) cited a number of problems 

in geometric probability where the result depended on the method of solu- 
tion. In Example 1.3.9 we will discuss the famous Bertrand paradox and 

show that in reality there is nothing paradoxical about Bertrand’s paradoxes; 
once we define "probability spaces" carefully, the paradox is resolved. 

Nevertheless: difficulties encountered in the field of geometric probability 
have been largely responsible for the slow growth of probability theory and 
its tardy acceptance by mathematicians as a mathematical discipline. 

The mathematical theory of probability, as we know it today, is of 
comparatively recent origin. ~ It was А. N. Kolmogoroy who axiomatized 
probability in his fundamental work, Foundations of the Theory of Proba- 
bility (Berlin), in 1933. According to this development, random events are 
represented by sets and probability is just a normed measure defined on these 
sets. This measure-theoretic development not only provided a logically 
consistent foundation for probability theory but also, at the same time, 
joined it to the mainstream of modern mathematics. 

In this book, we follow Kolmogorov's axiomatic development. In Section 
2 we introduce the notion of a sample space. In Section 3 we state Kolmo- 
gorov's axioms of probability and study some simple consequences of these 
axioms. Section 4 is devoted to the computation of probability on finite 
sample spaces, Section 5 deals with conditional probability and Bayes’s.rule, 
while Section 6 examines the independence of events. 


1.2 SAMPLE SPACE > 


In most branches of knowledge experiments are а way of life. In probability 
and statistics, too, we concern ourselves with special types of experiments. 
Consider the following examples, 


Exemple. A coin is tossed. Assuming that the coin does not land on 
the side, there are two possible outcomes of the experiment: heads and tails, 
On any performance of this experiment one does not know what the outcome 
will be. The coin can be tossed as many times as desired. : 


Example 2. A roulette wheel is a circular disk divided into 38 equal sectors 
humbered from 0 to 36 and 00. A ball is rolled on the edge of the wheel, 
and the wheel is rolled in the opposite direction. One bets on any of the 38 
numbers of some combinations of them. One can also bet on a color, red’ 
or black. If the ball lands in the sector numbered’ 32, say, anybody who bet 


` . оп 32 or combinations including 32 wins, and so on. In this experiment, 


all possible outcomes are known in advance, namely 00, 0, 1, 2, --., 36, 
but on any performance of the experiment there is uncertainty as to what 
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'the outcome will be, provided, of course, that the wheel is not rigged in any 
manner. Clearly, the wheel can be rolled any number of times. 


Example 3. A manufacturer produces footrules. The experiment consists in 

"measuring the length of a footrule produced by the manufacturer as accu- 
rately as possible. Because of errors in the production process one does not 
know what the true length of the footrule selected will be. It is clear, however, 
that the length will be, say, between 11:апа 13 inches, or, if one wants to 
be safe, between 6 and 18 inches. 


Example 4. The length of life of a light bulb produced by a certain manu- 
facturer is recorded. In this case one does not know what the length of life 
will be-for the light bulb selected, but clearly one is aware in advance that 
it will be some number between 0 and oo hours. 


The'experiments described above have certain common features. For each 
experiment, we know in advance all possible outcomes, that is, there are no 
surprises in store after any performance of the experiment. On any perform- 
ance of the experiment, however, we do not know what the specific outcome 
will be, that is, there is uncertainty about the outcome on any performance 
of the experiment. Moreover, the experiment can be repeated under identical 

conditions. These features describe a random (or a statistical) experiment. 


Definition 1. A random (or a statistical) experiment is an experiment in 
. which 


(a) all outcomes of the experiment are known in advance, 

(b). any performance of the experiment results in an outcome that isynot 
known in advance, and 

(c) the experiment can be repeated under identical conditions. 


i 
« 


In probability theory we study this uncertainty of a random experiment, 
It is convenient to associate with each such experiment a set Q, the set of all 
possible outcomes of the experiment. To engage in any meaningful discussion 
about the experiment, we associate with Q a c-field 5, of subsets of Q, | 


Definition 2. The sample space of a statistical experiment is a pair (0, 7), 
where : ‘ > 
(a) . 0 is the set of all possible outcomes of the experiment, and 
(b) 4 is.a.c-field of subsets of 2. 


We recall that a o-field is a nonempty class of subsets of Q that is closed 
under the formation of countable unions and complements and contains the 
null set ф. 
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an event. Clearly A is a collection of sample points. We say that an event A 

ns if the outcome of the experiment corresponds to a point in A. Each 
one-point set is known as a simple or an elementary event. If the set Q 
contains only a finite number of points, we say that (0, S) is a finite sample 
space. If О contains at most a countable number of points, we call (2, #) a 
discrete sample space. If, however, Q contajns uncountably many points, we 
say that (Q, 5^) is an uncountable sample space. In particular, if Q = @, or 
some rectangle іп @,, we call it a continuous sample space. 


then elements of Q are called sample points. Any set Ає is known as 


Remark 1. The choice of ¥ is an important one, and some remarks are іп 
order. If Q contains at most a countable number of points, we can always 
take 5^ to be the class of all subsets of Q. This is certainly a o-field. Each 
one point set is a member of “and is the fundamental object of interest: 
Every subset of Q is an event. If Q has uncountably many points, the class 


- of all subsets.of Q is still а g-field, but it is much too large a class of sets 


to be of interest. It may not be possible to choose the class of all ‘subsets of 
Qas Z. One of the most important examples of an uncountable sample 
space is the case in which 0 = 2 or Q is an interval-in 4. [n this case we 
would like all one-point subsets of Q and all intervals (closed, open, or 
semiclosed) to be events. We use our knowledge of analysis to specify ¥. 
We will not go into details here except to recall (Theorem Р. 1.2) that the 
class of all semiclosed intervals (a, 5] generates a class 38, which is a o-field 
on 2. This class contains all one-point sets and all intervals (finite or infinite). 
We take 5^ = #1. There are many subsets of 2 that are not in $85 but we. 
will not demonstrate this fact here. We refer the reader to Halmos [45], 
Royden [107], or Kolmogorov and Fomin [61] for further details. 

We will frequently use the fact that every Borel set Be 38, can be obtained 
by a countable number of operations of unions, intersections, and differences .' 
on intervals. The following relations will be frequently used in subsequent 
sections : 


w = feu. x 

(х, у) = (x, у] — 0 
[x у] = Gs у} + {x}, ] б rg 
ар.) = {х} + (х,у]— {у}, E б 
@ +=ў=(=,л 


+ 


and so on. Similar remarks apply when Q = @, or a 


S.C.E R T.. West n vo ^ 3 
Basi. d rr. ero ae “on 
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9 = B,. Since we will be dealing mostly with the one-dimensional case, we 
will write Y instead of 9). 


Example 5. Let us toss a coin. The set Q is the set of symbols H and T, 
where Н denotes head and T represents tail. Also, ¥ is the class of all 
subsets of Q; namely, {{H}, (T), (H, Т}, d). If the coin is tossed two times, 
then us ^ 4 


0 = (Н, H), (Н, D, (T, Н), (Т, D... ¥ = (9, (H, B), 

(Œ, D), (T, B), (Т, D), (Н, Н), (Н, D}, (CH, н), (Т, Н), 

{(Н, Н), (T, T), (H, Т), (T, B), (Т, Т), (T, Н)), (T. Т), 
Œ, T), ((H, Н), (Н, T), (T, B), (Н, Н), (Н, Т), (Т, T). 
7^ (X8, н), (Т, н), (T, T)), (Н, T). (T, B), (T, D}, 0) 


where.the. first element of a pair denotes the outcome of the first toss, and 
the second element, the outcome of the second toss. The event at least one 
head consists of sample points (H, H), (H, T), (T, H). The event at most 
one head is the collection of sample points (H, T), (T, H), (T, T). ^ 


Example 6. А die is rolled n times. The sample space is the pair (0, 5), 
Where Q is the set of all n-tuples (xi, ху, >, хл), xe (1, 2, 3, 4, 5, 6}, 
t=, 2, ---, n, and 2 is the class of all subsets of Q. Q contains 6^ elemen- 
tary events. The event A that 1 shows at least once is the set 


A = {(%1, Xo +, xp): at least one of x/'s is 1) 
=Q — (Qus xn у X): none of the x/'s is 1} 
HA 980. AC may s л): x6 (2,34, 5,6), = 1,2, ..›, п}. 


Example 7. А coin is tossed until the first head appears. Then 


EY Q = {H, (T, H), (T, T, H), (T, T, T, H); x 
and ^is the class of all subsets of (2. An equivalent way of writing Q would 
be to look at the number of tosses required for the first head. Clearly, this 
number can take values 1, 2, 3, -.., so that Q is the sehof all positive integers. 
Then 7,15 the class of all subsets of positive integers, 


Example 8. Consider a pointer that is free to spin about the center of a 
circle. If the pointer is spun by an impulse, it will finally come to rest at 
some point. On the assumption that the mechanism is not rigged in any 
. manner, each point on the circumference is a Possible outcome of the experi- 
ment. The set 0 consists of all points 0 < x < 2zr, where r is the radius of 
the circle. Every one-point set {x} isa simple event, namely, that the pointer 
will come to rest at x. The events of interest are those in which the pointer 
steps at a point belonging to a specified arc. Here 5 is taken to be the Borel 
o-field subsets of [0, 2x7). 


Р 
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Example 9. А rod of length Lis thrown onto a flat table, which is ruled with 
parallel lines at distance 27. The experiment consists in noting whether or, 
not the rod intersects one of the ruled lines. 

Let г denote the distance from the center of the rod to the nearest. ruled | 
line, and let 0 be the angle that the axis of the rod makes with this line 
(Fig. 1). Every outcome of this experiment corresponds to a point (r, 0) in © 
the plane. As Q we take the set of all points (г, 0) in ((r, 0:0 € r< I 
0 < 0 < я}. For 5 we take the Borel o-field, B,, of subsets of 0, that i is 
the smallest g-field generated by rectangles of the form 


{(х,у): a«xxb c<ysd, 0<а<Ь<1, 0<c<d<z}. 


Clearly the rod will intersect a ruled line if and only if the center of the rod 
lies in the area enclosed by the locus of the center of the rod (while one end 
touches the nearest line) and the nearest line (shaded area in Fig. 2). 


r 


Fig. 1 Fig. 2 


Remark 2. From the discussion above it should be clear that in the discrete 
case there is really no problem. Every one-point set is also an event, and. 
is the class of all subsets of Q. The problem, if there is any, arises only in 
regard to uncountale sample spaces. The reader has to remember only that ` 
in this case not all subsets of Q are events. The case of interest is the one in 
which 0 = Ф. In this case roughly all sets that have a well-defined volume 
(or area or length) are events. Not every set has the property in question, 
but sets that lack it are not easy to find and one does not encounter them 
in practice. 


PROBLEMS 1.2 


1. A club has five members А, B, C, D, and E. It is required to select a chairman 
and a secretary. Assuming that one member cannot occupy both positions, write 
the sample space associated with these selections. What is the event that member 
A is an office holder? 
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2. In each of the following experiments, what is the sample space? 

(a) Inasurvey cf families with three children, the sexes of the children are recorded 
in increasing order of age. + 

(b). The experiment consists of selecting four items from a manufacturer's output 
and observing whether or not each item is defective, 

(c) A given book is opened to any page, and the number of misprints is counted. 

(d) Two cards are drawn. (i) with replacement, (ii) without replacement from 
an ordinary deck of cards, ^ 

. 3. Let A, B, C be three arbitrary events on a sample space (0, ^). What is the 
event that only A occurs? What is the event that at least two of A, B,C occur? 


What is the event that both A and C, but not B, occur? What is the event that 
at most one of А, B, C occurs?, 


1.3 PROBABILITY AXIOMS 


Let (Q, S) be the sample space associated with a statistical experiment. 
In this section we define a probability set function and study some. of its 
properties. 


Definition 1. Let (Q, $^) be a sample Space. A set function P defined on 
У is called a probability measure (or simply probability) if it satisfies the 
following conditions: А е 

GQ - P(A)z0forall Ae 5. 

i) . Р(0) = 1. 


(ili) Let (4), Aj € 9, J= 1, 2, -—, be a disjoint sequence of sets, that 


is, А, N A, = ¢ for j # k. Then 


We call P(A) the probability of event A. If there is no confusion, we 
write PA instead of P(A), Property (iii) is called countable additivity. That 
Рф = 0 and P is also finitely additive follows from it. 


` Remark. 1€Q is discrete and contains at most n (< оо) points, each single- 

point set {w;}, j = 1, 2, --., n, is an elementary event, and it is Sufficient to 

assign probability to each {w,}. Then, if.4 € 5, where 4 is the class of all 

Subsets of Q, PA = у, P{w}. One such assignment is the equally likely 

assignment or the assignment of uniform probabilities. According to this 

assignment, P{w,} = 1/n, j = 1, 2, +, n. Thus PA = т/п if A contains m 
tary events, 1 < m < n. 
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Remark 2. If Q.is discrete and contains a countable number of points, one 
cannot make an equally likely assignment of probabilities. It suffices to 
make the assignment for each elementary event. If Ae 7, where 9 is the 
class of all subsets of Q, define PA = X ssä eon 


Remark 3. If Q contains uncountably У points, each one-point set is an 
elementary event, and again one cannot make an equally likely assignment 
of probabilities. Indeed, one cannot assign positive probability to each 
elementary event without violating the axiom PQ = 1. In this case one 
assigns probabilities to compound events consisting of intervals. For 
example, if 0 = [0, 1], and & is the Borel c-field of all subets of Q, the 
assignment P[I] = length of J, where Z 18-а subinterval of (0, defines. a 
probability., * 


` Definition 2. ‘The triple (0, ^, P) is called a probability space. 


Definition 3. Let A є 5; we say that the odds for A are a to b if 
РА = aj (a + b), and then the odds against А are b to a. 


Example 1. Let us toss a coin. The sample space is (0, 9), where 
Q = (H,T), and ¥ is the o-field of all subsets of 0. Let us define P on 9 
as follows. 


Р{Н} = 12, P{T} = 1/2. 


Then P clearly defines a probability. Similarly, P{H} = 2/3, Р{Т) = 1/3, 
and P{H} = 1, P{T} = 0 are probabilities defined on 5^. Indeed, 


P{H} =p and P(T) 21-p (0<р<1) 
defines a probability on (0, 2). А ! 


Example 2. Let 0 = {1, 2, 3,---} be the set of positive integers, and let ^ 
be the class of all subsets of 0. Define P оп 5 as follows: 


P{i} = 3 їл фы; 


Then Z, P{i} = 1, and P definesa probability. 


4 


Example 3. Let 0 = (0, со) and ¥ = $, the Borel q;Ejeld on Q. Define P 
as follows: for each interval J © 0, 


s nz feas 
B.C.E R Y., "est Benga) 
L1 RP 


_ Koc. No... ...... ee ces овоа» 


| 


4 
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Clearly Pi > 0, PQ = 1, and P is countably additive by properties of inte- 


D 


‘Theorem 1. P is monotone and subtractive; that is, if 4, Be ¥ and 
Ac B, then PA < PB and P(B — A) = PB — РА. 


Proof. If A c B, then 
А В (AN B)+(B-A)= A+ (BA) 
and it follows that РВ = РА + P(B — A). 


Corollary. For all 4€ 7, 0 < PA SE 

Remark 4. We wish to emphasize at this point that, if PA = 0 for some 
AEJ, we са А an event with zero probability or a null event. However, it 
does not follow that A = $. Similarly, if PB = 1 for some Be 5, we call 
B a certain event but it does not follow that B — Q. 

Theorem 2 (The Addition Rule). If A, Be £, then 

(2) P(A U В) = РА + PB — P(A n B). 

Proof. 


AU B-(A— B) +(B— 4) 4 (A n B), 
and | ' 


4 — (4 п) - (4 — В), Bi= (А NB) + (B — A): 
The result follows by countable additivity of P. 


Corollary 1.. P is subadditive, that is, if A, Be 5, then 
@) PCA U B) < РА+ РВ. 
Согойагу 1 can be extended to an arbitrary number of events y Aj 
(4) KU A)< E PA; 
Y Corollary 2. If B = A‘, then A and B are disjoint and 
(5 AY PA=1— PA. 


The following generalization of (2) is left as an exercise, 
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Theorem 3 (The Principle of Inclusion-Exclusion). Let Aj, 4o, -y 4,€S. 
Then: { 3 


(6) P(() A) = X PA, — У PAs, п A) 
, k=l kicka 


È P 0n An) 
Ay cha<kg 


ж CIRCA) Ap. 
k=l 


Example 4. A. die is rolled twice. Let all the elementary events in 
Q = {(i, j): i,j = 1, 2, ---, 6} be assigned the same probability. Let A be 
the event that the first throw shows a number < 2, and В, the event that 
the second throw shows at least 5. Then 


A= (ijh SiS 2) jf = TRA 2-65 
В = {(1, 7): 62jz5i2sL2, --, 6}, 
A n B = {(1, 5); (15.6. (2,5), (2, 6)); 


P(A U B) = PA + PB — P(A n B) А 
=j gsi 6 
Example 5. А coin is tosséd three times. Let us assign equal probability to 


each of the 2° elementary events in Q. Let A be the event that at least one 
head shows up in three throws. Then 


P(A) = 1 — P(4*) 
- | — P(no heads) 
='1 — P(TTT)-|. 


We next derive two useful inequalities. 
Theorem 4 ^ (Bonferroni's Inequality). | Given n (> 1) events Aj, Ay, +, Am 
(7) Ў РА; — Z PALMA) = (4) < Ў PA. 
1 i, i=l і=1 


Proof. In view of (4) it suffices to prove the left side of (7). The proof is - 
by induction. The inequality on the left is true for n = 2 since 
SP Ay + PA ~ P(A, п Az) = РА, U А). 


For n = 3, 


UA) = EPA, EE PA 0 4) + PCA; A A п Ау, 


and the result holds. Assuming that (7) holds for 3 < m € n — 1, we коси 
that it holds also for т + 1: 


+1 т, 
PY 42 = PCY) 42. Ansi) 
EP = . 
= PC) A) + Pasi Ра 0 C) 4) 
mti т m 
=D PA; – DE P(A; п 4) — P(( (Ay nA, 
11 i<j P 
= т т 
> PA, - BE МА n 4) - È P n Ayes) 
i-| i<j = 
=) PA, - ЎЎ PCA, п Аў. 
ї=1 i<j 
Theorem 5  (Boole's Inequality). For any two events, A and B, 
& -(— P(A П B)>1— PA‘ — РЕ. 
Proof. The proof is simple. 
Corollary 1. Let {4;}, j = 1, 2, ·-:, be a countable sequence of events; then 


(9) PAN 49 2 1 = ЯРА). 
Proof. Take 


in (8). 
Corollary 2 (The Implication Rule). If 4, B, Ce апі Aand B imply С, 
um à 
(10) РС < РА + PB‘. 
“Proof. Since А п В С, А UBS 2 C°, we have from (3) 
PC’ < Р(А U B) < PAS + PB‘. 


Theorem 6. Let {A,} be a nondecreasing sequence of events in 5, that 
is, 4,65, n = 1, 2, ·.., and 
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Ay 2 Anp W=2,3, 6°. \ 


Then 
an lim PA, = P(im A,) = P(() 49. 
п—=о ao n=l 
Proof, Let. i 
: ENG 
j=l 
Then 


А= А, + En — Aj). 
= 
By countable additivity we have f 


PA = PA, + È PAs — A). 
=" 


and, letting n — oo, we see that 
PA = lim PA, + lim b P(Aya — A. 
no n-co j=n 


The second term on the right tends to zero as m-— oo since the sum ` 
ўа Р(Ан — 4А) S 1, and the result follows. 


Corollary. Let (4,) be a nonincreasing sequence of events in 5. Then 


12^. lim PA, = Pdim 4) = РОЙ 4. 
noo п—=о n=l 


Proof. Consider the nondecreasing sequence of events {А„}. Then 
: lim 4t = () Ж = 4. 
n=% Al 
It follows from Theorem 6 that 
lim PA = P(lim Ас) = P( U Аў;= PG 
pu n=00 jal 
In other words, 
lim (1 — PA,) = 1 — PA, à T 
по 


as asserted, 
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Remark 5. Theorem 6 and its corollary will be used quite frequently in 
subsequent chapters. Property (11) is called the continuity of P from below, ' 
and (12) is known as the continuity of P from above. Thus Theorem 6 апа 


its corollary assure us that the set function P is continuous from above and 
below. 


` We conclude this section with some remarks concerning the use of the 
word “random” in this book. In probability theory “random” has essentially 
three meanings. First, in sampling from a finite population a sample is said 
to be a random sample if at each draw all members available for selection 
have the same probability of being included. We will discuss sampling from 
a finite population in Section 1.4. Second, we speak of a random sample 


_ from a probability distribution. This notion is formalized in Section 7.2. The 


third meaning arises in the context of geometric probability, where statements 
like “а point is randomly chosen from the interval (a, 6)” and “а point is 
picked randomly from a unit square" are frequently encountered. Once we 
have studied random variables and their distributions, problems involving 
geometric probabilities may be formulated in terms of problems involving 
independent uniformly distributed random variables, and these statements 
can be given appropriate interpretations. 

‘Roughly speaking, these statements involve a certain assignment of proba- 
bility. The word "random" expresses our desire to assign equal probability 
to sets of equal lengths, areas, or volumes: Let Q © &, be a given set, and 
А be a subset of 0. We are interested in the probability that a “randomly 
chosen point" in 2 falls in A. Here “randomly chosen" means that the point 
may be any point of Q and that the probability of its falling in some subset 
M of 2 is proportional to the measure of A (independently of the location 
and shape of A). Assuming that both A and Q have well defined finite 
Measures (length, area, volume, etc.), we define 1 


_ measure( A) 
PA. measure(Q)^ 


(In the language of measure theory we are assuming that Q is a measurable 
subset of #, that has a finite, positive Lebesgue measure. If A is any 
measurable set, PA = (4) / (0), where u is the n-dimensional Lebesgue 
measure.) Thus, if a point is chosen at random from the interval (a, b), the 
probability that it lies in the interval (c, d), a < c < d < b, is (4— c)/ (b — a). 
We present some examples. : < 


Example 6. A point is picked “at random” from a unit square.» Let 
Q ={(x, y):0 < x < 1,0 < y < I). It is clear that all rectangles and their 
unions must be in J^. So too should all circles in the unit square, since the 
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area of a circle is also well defined. Indeed every set that has a well-defined 
area has to be in 4. We choose Z = B2, the Borel g-field generated by 
rectangles іп 0. As for the probability assignment, if А € 5, we assign PA to 
A, where PA is the area of the set А. If-4 = {(x, y): 0 < x < 1/2, 
1/2 < y € 1}, then PA = 1/4: If B is a circle with center (1/2, 1/2) and 
radius 1/2,then PB = x(1/2)" = z/4. If C is the set of all points which are 


7 y 


(0,1) (1,1) Y 


(0,0) (1,0) : Т (0, < Д 0) 


(0,1) (1,1) 


(0,0) ' (1,0) 
Fig. 3 
at most a unit distant from the origin, then PC = 2/4. (See Figs. 1 to 3.) 


Example7 (Buffon’s Needle Problem). We return to Example 1. 2. 9. A 
needle (rod) of length / is tossed at random on a plane that is ruled with a 
series of parallel lines at distance 2/ apart. We wish to find the probability 
that the needle will intersect one of the lines. Denoting by r the distance 
from the center of the needle to. the closest line and by 0 the.angle that the 
needle forms with this line, we see that a necessary and sufficient condition" 
for the needle to intersect the line is that r < (1/2) sin б. The needle will 
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‘intersect the nearest line if and only if its center falls in the shaded region 
in Fig. 1.2.2. We assign probability to an event A as follows: 


PA = Area ofiset A 
465 1 Ix Г 
Thus the required probability is 
ГРАНЫ, wp 
т | sino a = а 


Here we have interpreted “at random” to mean that the position of the needle 
is characterized by a point (r, 0) which lies in the rectangle 0 < г< l, 
0 < 0 x л. We have assumed that the probability that the point (r, 0) lies 
in any arbitrary subset of this rectangle is proportional to the area of this 
set. Roughly, this means that “all positions of the midpoint of the needle 
are assigned the same weight and all directions of the needle are assigned 
the same weight.” 


Example 8. . An interval of léngth 1, say (0, 1), is divided into three intervals 
hy choosing two points at random. What is the probability that the three 
line segments form a triangle? 

It is clear that a necessary and sufficient condition for the three segments 
to form a triangle is that the length of any one of the segments be less than 
the sum of the other two. Let X, y be the abscissas of the two points chosen 
at random. Then we must have either 


Dx xy «1, and pies. 


or 0«y« Jd «x«1 and X= ye. 


This is precisely the shaded area in Fig. 4. It follows that the required 
probability is 1/4. 


10,1) (1,1) 


(0,0) (1,0) ^ 
З 
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If it is specified in advance that the point x is chosen at random from 
(0, 1/2), and the point y at random from (1/2, 1), we must have 


1 1 
0<х<-—у, $3*x*bL 
and 

y-x«x-ctl-y or у= x) <1. 


In this case the area bounded by these lines is the shaded area in Fig. 5, and 
it follows that the required probability is 1/2. 
Note the difference in sample spaces in the two computations made above. 


Example 9 (Bertrand’s Paradox). A chord is drawn at random in the unit 
circle. What is the probability that the chord is longer than the side of the 
equilateral triangle inscribed in the circle? 

We present here two of several solutions to this problem, depending. on 


Fig. 5 


how we interpret the phrase “at random." The paradox is resolved once we 
define the probability spaces carefully. 


Solution 1. Since the length of a chord is uniquely determined by. the 
position of its midpoint, choose a point C at random in the circle and draw 
a line through C and О, the center of the circle (Еїр. 6). Draw the chord 
through C perpendicular to the line OC. If J, is the length of thé chord with 
Cas midpoint, /, > 4/3 if and ony if C lies inside the circle with center О 
and radius 1/2. Thus PA = x(1/2)*/z = 1/4. 

In this case Q is the circle with center O and radius one, and the event | 
A is the concentric circle with centre O and radius 1. ¥ is the usual Роги. f 
a-field of subsets of 0. 
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\ 


Fig. 6 


Solution 2. Because of symmetry, we may fix one end point of the chord 
at some point Р and then choose the other end point Р, at random. Let the 
probability that P, lies on an arbitrary arc of the circle be proportional 
to the length of this arc. Now the inscribed equilateral triangle having P as 
one of its vertices divides the circumference into three equal parts. A chord 
drawn through P will be longer than the side of the triangle if and only if 
the other end point Р, (Fig. 7) of the chord lies on that one third of the 

- circumference that is opposite to P. It follows that the required probability 
is 1/3. In this case 0 = [0, 27], # = 8, N Nand A = [2z/3, 4л /3]. 


$ . Fig. 7 
PROBLEMS 1.3 


1. Let Q be the set of all nonnegative integers, and ¥ the class of all subsets of 
‚0. In each of the following cases does P define a probability measure on (Q, 2)? 


* (a) | For Ae £, let 


PAIS, ахо. 


> sea X 

(b) For Ae Z, let 

PA-Xr-»py, 0<р<1. 
ХЄА 


all subsets of Q.-For any Ae $^, 
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(c) For Aeg, let PA = 1 if A has a finite number of elements, and PA = 0 
otherwise. 

2. Let Q = 4? and ¥ = 88. In each of the following cases does Р define a prob-/- 

ability measure оп (0, 5^)? Ї 

(a) For each interval 7, let 


5 С Де 
r= | 1 preg ise 1 

(b) For each interval /, let РЇ — 1 if I is'an' interval of finite length, and PI = 0 

if Гіѕ an infinite interval. + 
(c) For each interval /, let PI = Oif J © (17.90, 1), and PI = f; (1/2) dx if. 

1€ [1,00]. (If / = /, + 1, where I; € (—co, 1) and /, € [1, оо), then 

PI = Ph.) 
3. Let A and B be two events such that B. 2 A. What is P(A U B)? What is 
P(A п В)? What is P(A — B)? å 
4. Іа Problems 1(a) and 1(b), let A = {all integers > 2}, B = {all nonnegative integers 
<3}, and C = {all integers x, 3'< x < 6}. Find PA, PB, PC, P(ANB), P(AU B), P(B UC), 
P(A ПС), and P(B N C). 
5. In Problem 2(a) let A be the event A = {x:x > 0}. Find PA. 
6. А Бох contains 1000 light bulbs. The probability that there is at least 1-defec- 
tive) bulb, in the Бох 1з .1, and the probability. that there are at least 2 defective 
bulbs is .05. Find the probability in each. of the. following cases: 
(a) The box contains no defective bulbs. 
(b) The box contains exactly 1 defective bulb. 
(c) The box contains at most 1 defective bulb. 
7. Two points are chosen at random on a line of unit length. Find the probability 
that each of the three line segments so formed will have a length > 1/4. 
8. Find the probability that the. sum of two randomly chosen positive numbers 
(both < 1) will not exceed 1, and that their product will be = 2/9. 
9. Prove Theorem 3. 


10. "Let (4,] be a sequence of events such that А, > А аз n — co, Show that 
PA, — PA as'n + оо. 


1.4 COMBINATORICS : PROBABILITY ON FINITE SAMPLE SPACES · Ў 


B 


In this section we restrict our attention to sample spaces that have at most 
a finite number'of points. Let Q'— (oj, ш, ++, wn} and Бе the c-field of 


* 


PA= z Plo). 


| 
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\ Definition 1. An assignment of probability is said to be equally likely (or 
uniform) if each elementary event in Q is assigned the same probability. 
Thus, if 0 contains n points o, P{w,} = I/n, j = 1, Qs He 
‘With this assignment 


number of elementary events in A 
total number of elementary events in Q ^ 


(1) PA= 


Example 1. A coin is tossed twice. The sample space consists of four points. 
Under the uniform assignment, each of four elementary events is assigned 
probability 1/4. 

Example 2. Three dice are rolled. The sample space consists of 6? points. 
Each one-point set is assigned probability 1/6°. " 


In games of chance we usually deal with finite sample spaces where uniform 
probability is assigned to all simple events. The same is the case in sampling 
schemes. In such instances the computation of the probability of an event A 
reduces to a combinatorial counting problem. We therefore consider some 
rules of counting. 


Rule 1 

Given a collection of n; elements aji; йү, ** Aims Mp elements azi, 422, ·"-, 
а,» and so on, up to ñ, elements аџ, йы, ^; а, itis possible to form 
ту. My, ++. пу ordered k-tuplets (@;,,, @2,;,, =“, @;,) containing one element of 
each kind, 1 < j; < п, i = 1, 2, +++, k: 


Proof. The proof is easy and is left as an exercise. 


Example 3. Here r distinguishable balls are to be placed in n cells. This amounts 
to choosing one cell for each ball. The sample space consists of n' r-tuples (i,, i», 
«+.» 4), where i, is the cell number of the jth ball, j= 1,2, .1.,7, (1& į < n). 

Consider r tossings with a coin. There are 2” possible outcomes: The 
probability that no heads will show up in throws is (1/2). Similarly, the 
probability that no 6 will turn up in г throws of a die is (5/6). 

Rule 2 is concerned with ordered samples. Consider a set of n elements 41, 
45, «++, dy. Any ordered arrangement (а, @;,, --*, а) of r symbols is called 
an ordered sample of size r. If elements are selected one by one, there are 
two possibilities: | 
9 (a) Sampling with replacement: In this) case repetitions are permitted, 

and we can draw samples of arbitrary size. Clearly there are п 
samples of size r. 


— - 
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(b). Sampling without replacement: In this case an element once chosen ig 
not replaced, so that there can be no repetitions. Clearly the sample 
size cannot exceed л, the size of the population. There are n(n — 1) 
-(n—r+1)=,P,, say, possible samples of size r. Clearly 

„P, = 0 for integers r > n. If r = n, then „Р, = п!. 


Rule 2 
If ordered samples of size r are drawn from a population of n elements, 
there are n” different samples, with replacement and „P, samples without 


replacement. 
Corollary. · Тһе number of permutations of n objects is л!. 


Remark. We will frequently use the term “random sample" in this book to 
describe the equal assignment of probability to all possible samples in 
sampling from a finite population. Thus, when we speak of a random 
sample of size г from a population of п elements, it means that each of 
n” samples, in sampling with replacement, has the same probability 1/m' 
or that each of ,P, samples, in sampling without replacement, is assigned 
probability 1/,P, 


Example 4. Consider a set of elements. A sample of size r is drawn at 
random with replacement. Then the probability that no element appears 
more than once is clearly ,,P,/n’. 

Thus, if п balls are to be randomly placed inn cells, the probability that 
each cell will be occupied is n!/n". 


Example 5. Consider a class of г students. The birthdays of these г 
students forma sample of.size r from the 365 days in the year. Then the 
probability that all z birthdays are different is з5Р, /(365):..Опе can show 
that this probability is < 1/2 if r = 23. 

Next suppose that each of the r students is asked for his birth date in 
order, with the instruction that as soon as a student hears his birth date he 
is to raise his hand. Let us compute the probability that a hand is first raised 
when the kth (k = 1, 2, ---, r) student is asked his birth date. Let p, be the 
probability that the procedure terminates at the kth student. Then 


фу = 6)" 


апа 


Ee o um (забт y 
рь ( ) tas JS ki) k=2,3, 
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Example 6. Let Q be the set of all permutations of п objects. Let A; be the 
v. of all permutations that leave the ith object unchanged. Then the set 
A; is the set of permutations with at least one fixed point. Clearly 


Оаа 


РА = чс, нр н 
P(A; п 4) = sa КОА КЫ) ТЗ. йуп, eic. 
`By Theorem 1.3.3 we have 
xÜ а) = (1- tape ar) 


As an application consider an absent-minded secretary who places n letters 
in n envelopes at random. Then the probability that she will misplace every 
letter is 


Rule 3 
There are (7) different subpopulations of size r < n from a population of n 
elements, where 


0). (= ear 


Proof. The proof of this result is left.as an exercise. 


Example 7. Consider.the random distribution of r balls in п cells. Let А, 
be the event that a specified cell has exactly К balls, k — 0, 1, 2,:-:, r; k 
balls can be chosen їп (7) ways. We place k balls in the’ specified cell and 
distribute the remaining г —k balls in the n — 1 cells in (n 1)7* ways. 


Thus 
a= (Quac (UD (os 


Example 8. There are (63) = 635,013,559,600 different hands at bridge, and 

(8) = 2,598,960 hands at poker. 

"The Probability that all 13 cards in a bridge hand have different face 
values is 418 (8). — 

The probability that a hand at. Ыы рой contains five different face values 
is 109). 
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Rule 4 

Consider a population of m elements. The number of ways in which the 
population can be гето into k subpopulations of sizes ry, ro, т, Th 
respectively, rj + ro + =: + г, =n, 0 < r; <n, is given by 


“п AY. n! 
6) : (i To, тә „)= riren 
The numbers defined in (3) are known as multinomial coefficients. 


Proof. For the proof of Rule 4 one uses Rule 3 repeatedly. Note that 


А Is is en ü ( n ) ik Rieke a rat en) { 


Example 9. In a game of bridge the probability that a hand of 13 cards 
contains 2 spades, 7 hearts, 3 diamonds, and 1 club is 


(2) 6) (3) (2) 
(уы 
Example 10. Ап urn contains 5 red, 3 green, 2 blue, and 4 white balls. A 


sample of size 8 is selected at random without replacement. The probability 
that the sample contains 2 red, 2 green, 1 blue, and 3 white balls is 


6 (0 Q) 


oes 
PROBLEMS 1.4 


1. How many different words can be formed by permuting letters of the word 
"'Mississippi'' ? How many of these start with the letters ‘‘Mi’’? 


2. Anurn contains R red and W white marbles. Marbles are drawn from the urn 


one after another without replacement. Let A, be the event that a red marble is 
drawn for the first time un the kth draw. Show that 


e R k-i Ж Е 
PA. (eq Porc E ( ЖЕЎЕТ 


Let р be the proportion of red marbles in the urn before the first draw. Show that : 
PA, > p(1 — p)*-! as R + W co. Is this to be expected? . 
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3. Ina population of N elements, R are red and W = N — Rare white. A group 
of n elements is selected at random. Find the probability that the group so chosen 
will contain exactly r red elements. 


4, Each permutation of the digits 1, 2, 3, 4, 5, 6 determines a six-digit number. 
If the numbers corresponding to all possible permutations are listed in increasing 
order of magnitude, find the 319th number on this list. 


5. The numbers 1,2, :::, n are arranged in random order. Find the probability 
that the digits 1, 2, +--, k (k < n) appear as neighbors in that order. 


6. A pin table has seven holes through which a ball can drop. Five balls are 
played. Assuming that at each play a ball is equally likely to go down any one of 
the seven holes, find the probability that more than one ball goes down at least 
one of the holes. 


7. If 2n boys are divided into two equal subgroups, find the probability that the 
two tallest boys will be (a) in different subgroups, and (b) in the same subgroup. 


8. In a movie theater that can accommodate л + k people, п people are seated. 
What is the probability that r < + given seats are occupied? 


9. Waiting in line for a Sunday morning Disney show, are 2n children. Tickets 
are priced at a quarter each. Find the probability that nobody will have to wait 
for change if, before a ticket is sold to the first customer, the cashier has2k (k < п) 
quarters. Assume that it is equally likely that each ticket is paid for with a quarter 
or a half-dollar coin. 

10. Each box of a certain brand of breakfast cereal contains a small charm, with 
k distinct charms forming a set. Assuming that the chance of drawing any partic- 
ular charm is equal to that of drawing any other charm, show that the probability 


of finding at least one complete set of charms in a random purchase of N > k 
boxes equals 


ESNE GO ncn UH 
[Hint: Use (1.3.6).] ' 
11. Prove Rules | through 4. 


1.5 CONDITIONAL PROBABILITY AND BAYES THEOREM 


So far, we have computed probabilities of events on the assumption that no 
information was available about the experiment other than the sample — 
space. Sometimes, however, it is known that an event Н has happened. How 
do we use this information in making a засаа concerning the outcome 
of another event A? 1 
Consider the following examples. 


CONDITIONAL PROBABILITY m 


Example 1. Let urn 1 contain one white and two black balls, and urn 2, 
one black and two white balls. A fair coin is tossed. If a head turns up, a 
ball is drawn at random from urn 1; otherwise, from urn 2. Let E be the 
event that the ball drawn is black. The sample space is 0 = {Hby, Нь, 
- Hwy, ТЁру, Two, Тиз), where H denotes head, T denotes tail, b;; denotes 
jth black ball in ith urn, i — 1, 2, and so on. Then $ 


PE = P{Hbi, Нь, Thi =} = 4. 


If, however, it is known that the coin showed a head, the ball could not have 
been drawn from urn 2. Thus the probability of E, conditionaloninformation 
Н, is 2. Note that this probability equals the ratio P{Head and ball drawn 
black} / P{Head}. 


Example 2. Let us toss two fair coins. Then the sample space of the expe- 
riment is Q = (HH, HT, TH, TT}. Let event A = (both coins show same 
face} and B = {at least one coin shows Н). Then PA = 2/4. If B is known 
to have happened, this information assures that TT cannot happen, and 
P{A conditional on the information that B has happened} = 1 = 1/2 = 
Р(АВ)/ PB. : 


Definition 1. Let (0, 5, P) be a probability space, and let He 5^ with 
PH > 0. For an arbitrary Ae 5 we shall write 

= P(A A H) 
(1) P{A|H} = PH 


and call the quantity so defined the conditional probability of A, given H. 
Conditional probability remains undefined when PH = 0. 


Theorem 1. Let (0, Z, P) be a probability space, and let HeY with 
PH > 0. Then (Q; V, P5), where P(A) = P(A|H) for all 4e. 4^, is a prob- 
ability space: : 


Proof. Clearly P(A) = P{A|H} > 0 for all Ae 5. Also, 
P,Q) = P(Q n H)/PH = 1. 1f Aj, Ag, - is a disjoint sequence of sets in 
9, then Л 


PBA) n4) 
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Remark 1. What we have done is to consider a new sample space consisting 
of the basic set H and the c-field Su = N H, of subsets A n H, Ae 7, 
of H. On this space we have defined a set function Py by multiplying the 
probability of each event by (PH). Indeed, (H, Ay, Py) is a probability 
space. . 


Let A and В be two events with PA > 0, PB > 0. Then it follows from 
(1) that 


P(A n B) = РА. P(B | А}, 
P(A n B) = PB- P(A | В). 


Q) 


"Equations (2) may be generalized to any number of events. Let Ay Az 
4,69, n > 2, and assume that Р(Г\Л А) > 0. Since 


4 > (410 4) > (41 N Az п А) э. > (a 214» 
We see that 

^ PA 20, P(AnA)»0.., ATA) >0. 
It follows that Р{А, | ГҮ! 4,} are well defined TM = 2,3, ‚п. 


Theorem 2 (The Multiplication Rule). Let (Q, 9, P) be a probability space 
and 4i, Az -—, AE P, with P((Yzi А) > 0. Then 


(3) P{() я) = РСА) РАА) PLASA п 4) PLAIA a. 
= j=l 
Proof. The proof is simple. 


Let us suppose that {Н} is a countable collection of events in ¥ such that 


H;n H, = ф, j # k, and 224; Hj = д. Suppose that PH; > 0 for all j. 
Then 


(4) РВ = ХАН) P{B | H} ' forall Bey. 
$3 
For the proof we note that 
B- 8 (B n Hj, 
y 


and the result follows. Equation (4) is called the total probability rule. 
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Example 3. Consider a hand of five cards іп a game of poker. If the cards 
are dealt at random, there are @) possible hands of five cards each. Let 


A = (atleast3 cards of spades},  B = {all 5 cards of spades}. 
Then 
P(A п В) = GAA 5 cards of spades} 


and 
P(A n B). 


P(B|4) = — p. 


13 52 
T ($)«($) Ai OOH 

13\ [39 ( 39 13 | 52V 

(3) (2) + 90) GG 
Example 4. Urn 1 contains one white and two black marbles, urn 2 contains 
one black and two white marbles, and urn 3 contains three black and three 
white marbles. A die is rolled. If 1, 2, or 3 shows up, urn 1 is selected; if 4 
shows up, urn 2 is selected; and if 5 or 6 shows up, urn.3 is selected. A 
marble is then drawn at random from the selected urn. Let A be the, event 


that the marble drawn is white. If U, V, W, respectively, denote the events 
that the urn selected is 1, 2, 3, then 


=(An U)+ (An И) + (АП), 

P(A п 0) = P(U)* P{A|U} = 2+ 4, 
P(A n 1) = P(V)= P{A|V} = 1- 3, 
P(A п W)- P(W) - P{A|W}= 2 - 2. 


It follows that 


M 
PA= { tyt 195 
A simple'consequence of the total probability rule is the Bayes theorem, 


which we now prove. 


Theorem 3 (Bayes Rule). Let (H,) be a disjoint sequence of events such 
that PH, > 0, n = 1,2, ---, and Dra Н, = 0. Let Be 4 with PB > 0. Then 
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>) PÍB|H, 
(5) PMB) = LCD PiBH) — . ENS 


È PŒ) Р{в|н)` 


Proof. From (2) 
P(B n Hj) = P(B) P{H,|B} = PH; P(B|Hj), 
‚ and it follows that } 
PH; Р{В|Н, 
P{H,|B} = —;-t ORD 


The result now follows on using (4). 


Remark 2. Suppose that Hy, Н», --- are all the “causes” that lead to the | 


outcome of a random experiment. Let Hj be the set of outcomes corre- 
sponding to the jth cause. Assume that the probabilities PH; j= 1, 2, ..-, 
called the prior probabilities, can be assigned. Now suppose that the experi- 
ment results in an event В of Positive probability. This information leads 
to a reassesment of the prior probabilities. The conditional probabilities 
P{H;|B} are called the Posterior ‘probabilities. Formula (5) can be inter- 


preted as a rule giving the probability that observed event B was due to cause 
or hypothesis H Р - 


, Example 5. In Example 4 let us compute the conditional probability 
P(V|4). We have. 


X PV P(A|V) 
rig PU P(A|U) + PV P(a|V] + PW P{A|W} 
: 1*4 gli 
UE ERATEN Oe Sar ЖЫЕН 


PROBLEMS 1.5 


1. Let A and B be two events such that PA — p, » 0, PB — p, » 0, and 
Pi + Pe > 1. Show that P(B4] > 1—((1— рур] 


2. Two digits are chosen at random without replacement from the set of integers 

(1, 2, 3, 4, 5, 6, 7, 8). 

(a) Find the probability that both digits are greater than 5. 

(b) Show that the probability that the sum of the digits will be equal to 5 is, the 
same as the probability that their sum will exceed 13. 
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4. In Problem 3 let us write 


Px = probability of а randomly chosen family having exactly k children = ap*, 
k= 12,0 


sa ee А. 
^ (=?) 
Suppose that all sex distributions of k chidren are equally likely. Find the prob- 


ability that a family has exactly r boys, 7 = 1. Find the conditional probability 
that a family has at least two boys, given that it has at least one boy. 


5. Each of (N+1) identical urns marked 0, 1, 2,---, N contains № balls. The kth 
urn contains k black and N — k white balls, k = 0, 1, 2, =+, М. An urn is chosen 
at random, and л random drawings aré\made from it, the ball drawn being always 
replaced. If all the п draws result in black balls, find the probability that the 
(n: ++ 1)їһ draw will also produce à black ball. How. does this probability behave 
as N — со? 


6. Each of л urns contains four white and six black balls, while another urn 
contains five white and five black balls: An urnyis«chosenat)random from the(n+ 1) 
urns; and two balls are’drawn from it; both being black. The probability that five 
white and three black-balls remain їп the-chosen urn is.1/7.-Find л, à 


7. Inanswering a question on a multiple choice test, a candidate either knows the 
answer with probability p (0 < p < 1) or does not know the answer with prob- 
ability 1 — p. If he knows the answer, he ptits down the correct answer with prob- 
ability .99, whereas if he guesses, the probability of his putting down the correct 
result is 1/k (К choices to the answer). Find the conditional probability that. the 
candidate knew the answef to a question, given that he has made the correct 
answer. Show that this probability tends to 1 as k — co. 


8. Anurn contains five white and four black balls. Four balls are transferred to 
a second urn. A ball is then drawn from this urn, and it happens to be black. Find 
the probability of drawing a white ball from among the remaining three. 


9. Prove Theorem 2. 


1.6. INDEPENDENCE OF EVENTS 


Let (Q, 5^; P) be a probability space, and let 4, Be 9 ;with;PB'».0. By the 
multiplication rule we have 


P(An B)= P(B) P(AIB). 


In, many quit the information provided by B does not affect the 
probability of event. А, that is, P(A|B] =Р{4}. 
Example 1. Let two fair coins be tossed, and let 

= [head on the second throw}, ‘B= {head ofi 


; a fst'throw): Then 
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P(A) = P{HH, TH} =, P(B) = (HH, HT} = !, 
and 


P(A n В) _ 


PAIS) m. 5650 D 3а, PA. 
BB СҮ 


Thus 
P(A n. B) = P(A) P(B). 
-Inthe following we will write:4^ B= AB. 


Definition 1. Two events, Aand В, are said to be independent if and only if 
(1) P(AB) = P(A) P(B). 


Note that we have not placed any restriction on P(A) or P(B). Thus conditional 
probability is not defined when P(A) or P(B) = 0, but independence is. Clearly, if 
P(A) = 0, then A is independent of every E e 5^; Also,any event A e is indepen- 
dent of $ and N. 


Theorem 1, If A and В are independent events, then 


Anus P{A|B} = P(A) Gf P(B) > 0, 
P(B|4) = P(B).. if P(A) ».0. 


Theorem 2. If A and B are independent, so are 4 and В“, A‘ and B, and А“ 
and В“. 


Proof. 


P(A‘B) = P(B — (An B) 
= P(B) — P(A п B) since B > (Ап В) 
= P(B) (1 — Р(А)} 
= P(A‘) P(B). 


‘Similarly, one proves that) 4‘ and В“, and A and В“, are independent. 


We wish to emphasize that independence of events is not to be confused 
with disjoint or mutually exclusive events. If two events, each with nonzero 
probability, are mutually exclusive, they are obviously dependent since the 
occurrence of one will automatically preclude the occurrence of thé other. 
Similarly, if 4 and B are independent and PA > 0, PB > 0, then A and В 
cannot be mutually exclusive. 


Example 2.._ A card is chosen at random from a deck of 52 cards. Let A be 
the event that the card is an ace, and B, the event that it is a club. Then ‘ 
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P(AB) = P (Ace of clubs} = 4, 


so that A and B are independent. 


Example 3. Consider families with two children, and assume that all four 
possible distributions of sex: BB, BG, GB, GG, where B stands for boy 
and G for girl, are equally likely. Let Бе the event that a randomly chosen 
family has at most one girl, and F, the event that the family has children of 
both sexes. Then 


Р(Е) = 4, Р(Е) = 3, ‚апі. P(EF) = }, 
so that Е and F аге not independent. 


Now consider families with three children. assuming that each of the 
eight possible sex distributions is equally likely, we have 


Р(Е) =$, Р(Е) = &, P(EF) = 3, 
so that Е and F are independent. 
An obvious extension of the concept of independence between two events 
A and B to a given collection 9( of events is to require that any t. -distinct 
events in Y be independent. 
Definition 2. Let 9 be a family of events from 4^. We say that the events 


Y are pairwise independent if and only if, for every pair of distinct events 
A, B et, 


P(AB) — PA PB. 


A much stronger and more useful concept is mutual or complete indepen- 
dence. 


Definition 3. А family of events Y is said to be a mutually or completely ~ 
independent family if and only if, for every. finite subcollection (Aj As nns 
Ai,} of 9, the following relation holds: : 


@у P(A, n А, 0... n A) = Th РА, 
i 
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In what follows we will omit the adjective “mutual” or "complete" and 
speak of independent events. It is clear from Definition 3 that in order to 
check the independence of n events Ay, A5, >, A, € P we must check the 
following 2" — n — 1 relations. 


P(A,A;) = PA; PA; з јуі = 1, 2,-5.0, 
P(4;A;A,) = PA, PA; PA, PAI # ki, j,k = 1, 2, ---, п, 


P(AiA2 ++: А,) = PA, PA, --- РА,. 
The first of these requirements is pairwise independence. Independence 
therefore implies pairwise independence, but not conversely. 


Example 4 (Wong [144]). Take four identical marbles. On the first, write 
symbols 414243. On each of the other three, write A}, 4), Аз, respectively. 
Put the four marbles in an urn and draw one at random. Let E; denote the 
event that the symbol 4; appears on the drawn marble. Then 


P(E\) = РСЕ») = P(E) ="4, 
P(EE, = Р(Е,Ез) = P(E\E;) = 4 
and 
P(E E) — 4. 
It follows that; although: events Ej, E», Ез, are not independent, they are 
pairwise independent. 


Example 5 (Кас [54], 22-23). In this example P(E\E,Es) = P(E,) P(E) 
P(E3), but Ey, E», Ез are not pairwise independent and hence not independ- 
ent. Let Q — (66 2, 3, 4}, and let p, be the probability assigned to {i}, i = 1, 
2,3, 4. Let p, = р-р “р = $. Let Е = (b 3}, E 
= {2, 3}, Е, = (3, 4). Then 


FEE) = PQ} = А2 ШЫ _ VS w 


= (р + po) (Pi + pa) (py + pi) 
х = P(E) P(E) P(E;). 
But (EE) = 3 — 72 # PE, PE», and it follows that Ej, Ej Ез are not 
_ independent. 


"Example 6. “A die is rolled repeatedly until a 6 turns up. We will show that 
event A that “а 6 will eventually show up" is certain to occur. Let A, be the 
event that a 6 will show up for the first time on the kth throw. Then 


à 
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D dass 1/5}! 2S 
A-EA, and PA, (5) > Edi 2/2] 


| Thus í 


Alternatively, we can use the corollary to Theorem 1.3.6. Let B, be the — 
event that а 6 does not show up on the first n trials. Clearly B,,, © В}, and 
we have 4° = ()*_, B,. Thus 


1 = PA = PA’ = P(() B,) = lim P(B,) = lim ("= 0. a 
я=1 n= pm 


Example 7. A slip of paper is given to person A, who marks it with either 
a plus or a minus sign; the probability of his writing a plus sign is 1/3..A 
passes the slip to B, who may either leave it alone or change the sign before 
| passing it to C. Next, C passes the slip to D after perhaps changing the 
sign; finally, D passes it to a referee after perhaps changing the sign. The 
referee sees a plus sign on the slip. It is known that B, C, and D each change 
the sign with probability 2/3. We shall compute the probability that A4 
originally wrote a plus. 
Let N be the event that A wrote a plus sign, and M, the event that he: 
wrote a minus sign. Let E be the event that the referee saw a plus sign оп. 
the slip. We have У 


^ P(N) P(E|N) 
РОЛЕ) = PM) РЕМ} + P(N) FEIN 
Now 
P{E|N} = P{The plus sign was either not changed or changed 
exactly twice} . 
= (4° +33) 4), 
and 
P |M} = P(The minus sign was changed either once or three times) | 
= 3(3) (4) + qu 
It follows that : 
at (HIG)? + 3 Gy dy - 
P{N|E}. = 
B7 Gg? «38 01 « D 68 УГ. 


=a =a: 


eS 
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PROBLEMS 1.6 


1. A biased coin is tossed until a head appears for the first time. Let p be the 
probability of a head, 0 « p < 1. What is the probability that the number of tosses 
required is odd? Even? 


2. Let A and B be two independent events defined on some probability space, 
andlet PA = 1/3, PB = 3/4. Find (a) P(A и B), (b) P(A|A U В), (c) P(B|A U B). 


3. Let A, An» А; be three independent events. Show that A» A; and Ai are 
independent. 


4. A biased coin with probability p, 0 < p < 1, of success (heads) is tossed until 
for the first time the same result occurs three times in succession (that is, three 
heads or three tails in elio Find the probability that the game will end at © 
the seventh throw. 


5, A box contains 20 black and 30 green balls. One ball at a time is drawn at 
random, its color is noted, and the ball is then replaced.in the box for the next < 
draw. 


(a) Find the probability that the first green ball is drawn on the fourth draw. 
(b) Find the probability that the third and fourth green balls are drawn on the 
3 sixth and ninth draws, respectively. 

(c) Let N be the trial at which the fifth green ball is drawn. Find the probability a 

that the fifth green ball is drawn on the nth draw. (Note that N take values А 
PUN] 


6. An urn contains four red and four black balls. A sample of two balls is drawn . 
at random. If both balls drawn are of the same color, these balls are set aside and 
a new sample is drawn. If the two balls drawn are of different colors, they are 
returned to the urn and another sample is drawn. Assume that the draws are 


independent and that the same sampling plan is pursued at each stage until all 
balls are drawn. i 


(a) Find the probability that at least n samples are drawn before two balls of the | 
same color appear. í 

(b), Find the probability that after the first two samples are drawn four balls are - 
left, two black and two red. 


7. Let A, B, and C be three boxes with three, four, and five cells respectively. a 
There are three yellow balls numbered 1 to 3, four green balls numbered 1 to 4, 
and five red balls numbered 1 to 5. The yellow balls are placed at random in box 
A, the green in B, and the red in C, with no cell receiving more than one ball. 
Find the probability that only one of the boxes will show no matches, 


8. А pond contains red and golden fish. There are 3000 red and 7000 golden fish, — 
of which 200 and 500, respectively, are tagged. Find the probability that a random | ^ 
sample of 100 red and 200 golden fish will show 15 and 20 tagged fish, respectively. 


9. Let (0, 5, P) be a probability space. Let A, B, C € ¥ with PB and PC > 0. 
If. B and C are independent show that 
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P(A|B) = P(A|B n C) PC + P(A|B n C} PC. 


Conversely, if this relation holds, P{A|BC} # P{A|B), and PA >0, then B and 
C are independent. (Strait [128]) 


10. Show that the converse of Theorem 2 also holds. Thus A and B аге 
independent if, and only if A and В“ are independent, and so on. 


CHAPTER 2 


Random Variables 
and Their 
Probability Distributions 


24 INTRODUCTION 


In Chapter 1 we dealt essentially with random experiments which can be 
described by finite sample spaces, We studied the assignment and computa- 
tion of probabilities of events. In practice, one observes a function defined 
on the space of outcomes. Thus, if a coin is tossed n times, one is not inter- 
ested in knowing which of the 2" n-tuples in the sample space has occurred. 
Rather, one would like to know the number of heads in n tosses. In games 
of chance one is interested in the net gain or loss of a certain player, Actual- 
ly in Chapter 1 we were concerned with such functions without defining the 

“term random variable. Here we study the notion of a random variable and 
examine some of its properties. 

In Section 2 we define a random variable, while in Section 3 we study the 
notion of probability distribution of a random variable. Section 4 deals with 
Some special types of random v- ` and Section 5 considers functions 
of a random variable and their ,.uuced distributions. 

The fundamental difference between a random variable and a real-valued 
function of a real variable is the associated notion of a probability distribu- 
tion. Nevertheless our knowledge of advanced calculus or real analysis is the 
basic tool in the study of random variables and their probability distribu- 
tions. 


22. RANDOM VARIABLES 


In Chapter 1 we studied properties of a set function P defined on a sample 
52 
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space (0, 5). Since P is a set function, it is not very easy to handle. More- 
over, in practice one frequently observes some function of elementary events. 
When a coin is tossed repeatedly, which replication resulted in heads is not 
of much interest. Rather one is interested in the number of heads, and con- 
sequently the number of tails, that appear in, say, п tossings of the coin. It 
is therefore desirable to introduce a point function on the sample space. 
We can then use our knowledge of advanced calculus. 


Definition 1. Let (0, 5) be a sample space. A finite, single-valued function 
which maps Q into 2 is called a random variable (rv) if the inverse images 
under X of all Borel sets in 2 are events, that is, if 


(1) X-(B)- (w:X(o)eB)e 2 forall Be. 


Let x € 2, and consider the semiclosed interval (— oo, x]. Since (— со, x] 
€ 38, it follows that if X is an rv, then X^! (— œ, x] = {X(w) < x)isan 
event in Z. Also, if B is a Borel set in 2, then B сап be obtained bya 
countable number of operations of unions, intersections, and differences of 
semiclosed intervals. The following result is obtained, using the properties 
of inverse images under X (see P. 2. 16). 


Theorem 1. X is an rv if and only if for each xe 2 
(2) : (o: Ҳа) < x} - (X < x) e 9. 


Remark 1. Note that the notion of probability does not enter into the de- 
finition of an rv. 


Remark 2. If X is an rv, the sets (Y = x}, {a < X <b}, (X < x}, 
{a < X < bj, (a < X <b}, (a < X < b) are all events. Indeed, we 
could have defined an rv in the following equivalent manner: Y is an rv 
if and only if : E 


зу {w X(w)<x}eF Гога ER 
We have j ; 

P 1 >i 
(4) <) - {х<х—-) Y 
and 
(5) =) = (к< x iy 
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Theorem 2. Let X be an rv defined on (A, S) and a, b be constants. 
Then aX + b is also an rv on (0, 5). 


Proof. 
(o0: ао) + b € x} = (aX € x — b). 
If a > 0, then 
fxs x- 5}. = {xs Peg. 
Ifa < 0, then 
х- Б x-—b 
{ax < x-0} ={X> }=fx< = yes 
If a = 0, then 


_ [0 if x—-bz0, 
exsx-9 =f it xp. 


The proof is complete. 


Example 1. For any set A c Q, define 


_ fo 0€ A, 
ie f wed. 
I,(w) is called the indicator function of set A. I, is an rv if and only if 
Ae 5. 


Example 2. Let Q = (H, T), and ¥ be the class of all subsets of 0: 
Define X by X(H) = 1, X(T) = 0. Then | 


à Ф if x<0, 
X (—-o,x]- [tn if O<x<1, 
(HT) if 1<x, 


and we see that Y is an rv. 


Example 3. Let 0 = (HH, TT, HT, TH), and ¥ be the class of all 
subsets of Q. Define X by ] 


Жо) = number of Н in o. 


Then X(HH) = 2, XHT) = (ТН) = 1, and X(TT) = 0. 
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Ps x<0, ree 
үа x {TT}, 0<х<1, 
I 2:717 (тт, нт, тн}, 1<х<2,, 

2, '"2£ x. 


Thus X is an rv. 


Remark 3. Let (0, 5) be a discrete sample space; that is, let ( be a count- 
able set of points, and 5 be the class of all subsets of Q. Then every nu- 
merical valued function defined on (0, Я) is an rv. 


Example 4. Let 0 = [0, 1] and = 9 п[0, 1], be the g-field of Borel sets 
on [0, 1]. Define X on Q by 

Х(о) = o, € € [0, 1]. 
Clearly X is an rv. Any Borel subset of 0 is an event. 
Remark 4. Let X be an rv. Then X? is an rv and 1/X is also an rv, pro- 


vided that (Y = 0} = ф. For (X? < x} =ф if x < 0 and if x > 0, then 
(X! < x} = {- ух € X < yx) e 9. Similarly, 


(а (pss x«de(e x2 dele х= 0) 
= {xX < 1} n (X <0} + {xX 2 1) n (x» 0) 
(x « 0) х= 0, 


е рп ој + {к> Io 9 ifx>0, 
(x2 Do xen (x«i Ahn (x>9} ifx <0. 
For a general result see Theorem 2.5.1. 


PROBLEMS 2.2 ) 
1. Let X be the number of heads in three tosses of а coin. What is 0? What 
are the values that X assigns to points of Q ? What are the events (X < 2.75}, 
{5 <Х< 1.72}? 


2. А die is tossed two times. Let X be the sum of face values on the two tosses, 
and Y be the absolute value of the difference in face values. What is 2? What 
values do X and Y assign to points of 2? Check to see whether X and Y are random 

variables. : k 
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3. Let X be an rv. Is |X| also an rv? If X is an rv that takes only nonnegative 
values, is ,/X also an rv? Y 

4. А die is rolled five times. Let X be the sum of face values. Write the events 
IX = 4), (X = 6), (X = 30), (X > 29). 


5. Let = [0,1], and ¥ be the Borel c-field of subsets of Q. Define X on Q as 
follows: X(») = 0 < w < 1/2, and X(w) = w — 1/2 if 1/2 < w € 1. Is X an 
гу? If so, what is the event (0: X(w) є (1/4, 1/2) ? : 


2.3 PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE 


In Section 2.2 we introduced the concept of an rv and noted that the con- 

cept of probability on the sample space was not involved in this definition. 

Let (0, 5, P) bea probability space, and let X be an rv defined on our prob- 
` ability space. ў 


Theorem 1. The гу Х defined on the probability space (Q, ^, P) induces 
a probability space (2, B, 0) by means of the correspondence 


(1) Q(B) = P(X (B) = P{w: X(o) e B) forall Be. 

‘We write О = PX^' and call Q or PX” the (probability) distribution of X. 
Proof. Clearly Q(B) > 0 for all Be 38, and also Q(@) = P{Xe@} = 
P(Q) = 1. 3 


Let Bj € 8, i = 1, 2,... with B; П B; = ф, i # j. Since the inverse image 
of a disjoint union of Borel sets is the disjoint union of their inverse images, 
wehave - 


9; B) = PIE вд} 
= х (вд) 
= Д РХ В) = Ў Q(B). 


It follows that (2, 9, 0) is а probability space, and the proof is com- 
plete. | . 5 


Since Q is a set function and set functions are not easy to handle, let us 
introduce a point function on 2. 5 b 


Definition 1. “А real-valued function F defined Оп (— со, со) that is nonde- 
creasing, right continous and satisfies у 
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F(—o)-0 апа Е(+ оо) = 1 d 
is called a distribution function (df). 


Remark 1. From our knowledge of calculus (see Р.2. 10) we see that 


- if F is а nondecreasing function on 2, then F(x —) = lim,,, F(t), 


Е(х+) = lim,;, F(t) exist and are finite. Also, F(-- oo) and F(— oo) exist 
as lim,; + F(t) and іт, ‚25 F(t), respectively. In general, 


Е(х—).< F(x) € F(x+), 


and x is a jump point of F if and only if F(x--) and F(x—) exist but are 
unequal. Thus à nondecreasing function F has only jump discontinuities. If 
we define 


F*(x) = F(x+) ` for all x, \ 


we see that F* is nondecreasing and right continuous оп 2, Thus in De- 
finition 1 the nondecreasing part is very important. Some authors demand 
left continuity in the definition of a df instead of right continuity. 


Theorem 2, The set.of discontinuity points of a df F is at most countable. 


Proof. Let (a, b] be a finite interval with at least п discontinuity points: 
acx «xj «cx, S b. 
Then 
F(a) € F(xy -) < FG) < - € F(x,—) < F(x,) < F(b). 
Let p, = Е(хұ) — F(x1 7) К = 1, 2, +++, n. Clearly 


È p, < Fb) — Fa) 
k=l 


and it follows that the number of points x in (a, b] with jump p(x) > & > 0 
is at most e-!(F(b) — F(a)}. Thus, for every integer N, the number of dis- 
continuity points with jump greater than 1/N is finite. It follows that there 
are no more than a countable number of discontinuity points in every finite 
interval (a, b]. Since 2 is a countable union of such intervals, the proof is 
complete. ә d t 


Definition 2. Let X be an rv defined on (0, 5, P). Define a point function 
F(.) on 2 by 
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(2) F(x) = P(o: X(o) < x) for all xe. 
The function F is called the distribution function of rv X. 


If there is no confusion, we will write nal 
F(x) = Р(Х < x}. 
The following result justifies our calling F as defined by (2) a df. 


Theorem 3. The function F defined in (2) is indeed a df. 


Proof. Let ху < хә. Then (— oo, xj] © (— о, x;], and we have 
F(x) = P(X < x) < P(X < x} = F(x). 


Since F is nondecreasing, it is sufficient to show that for any sequence of 
numbers x, | x, x) > x2 > = > Xn o5» х, F(x,) > F(x). Let 
A, = (o: X(w)e (x, xj]. Then A, е5 and A, t. Also, 


lim 4, & (4, = d, 
ko 3 N Y $ 
since none of the intervals (x, ху] contains x. It follows that lim, .., P(A,) -07 


But 


P(A,) = Р(Х < x,) — P(X < x} 
= F(x) – F(x), 


lim F(x) = F(x), 


and F is right continuous, 
Finally, let x, be a sequence of numbers decreasing to —oo. Then 

{X <х„} > {X < xm} for each n 

and Х 
lim {X< x, = (is х=, 
яә x 

Therefore 

F(—co) = lin P(X < x,} = P(lim {X  x)) = 0. 


Similarly, 
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F(4o)-lim P(X < x,} = 1, 
pu" 
and the proof is complete. 


The next result, which we state without proof, establishes a correspond- 
ence between the induced probability. Q on (2, 88) and a point function Ё 
defined on 2. 


Theorem 4. Given a probability Q on (2, 39), there exists а distribution 
function F satisfying i 
(3) Q(—o, x] = Ех) forall xe, 
and, conversely, given a df F, there exists a unique probability Q defined 
on (2, 8) that satisfies (3). 
For proof see Chung [15], pages 23-24. 
Theorem 5. Every df is the df of an rv on some probability space. 
Proof. Let F be a df. From Theorem 4 it follows that there exists a unique 
probability Q defined on 4? that satisfies , 
Q(—o, x] = F(x) for all xe. 
Let (2, B, О) be the probability space on which we define 
X(w) = o, оє 2. 
Then 4 
Qío: Ҳа) < x) = 0(—, x] = F(x), 
and F is the df of rv X. 


Remark 2. If Xis an rv on (Q, 9^, P), we have seen (Theorem 3) that 
F(x) = P(X < x) is a df associated with X. Theorem 5 assures us that to 
every df F we can associate some rv. Thus, given an rv, there exists a df, 
and conversely. In this book when we speak of an rv we will assume that 
it is defined on some probability space. з 


Example 1. Let X be defined on (0, F, P) by 
К Х(в)= с  foral оє0. 
Then р 
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Р{Х = с} =1, 
F(x) = Q- o, x] = Р(Х (—-o,x]) =0 if x«c 
and 
F(x) =1 eb 50А 
Example 2. Let Q = (H, T), and X be defined by 


XH)-1 X(1)=0. 
If P assigns equal mass to {Н} and {T}, then 


P{X = 0} = $= P{x = 1}, 


and 
, х<0, 


Fe) = Qe. d = ft 0<х<1, 
ya 15 х. 


Example 3. Let Q = {(i, j): i, je (1, 2, 3, 4, 5, 6}}, and Бе the set of 
all subsets of Q. Let P{(i, j)} = 1/6” for all 6^ pairs (i; j)in@. Define ` 
Mi,j)=i+j, 1si js 6. 
Then 
0, x<2, 
$ BSR <3 
3 s 3<x<4, 
F(x) = 0(— о, x] = P{X < x} = ($, 4sx<5, 


$ ^ Ws x<12, 

1, 12 x x. 
Example 4. We return to Example 2.2.4. For every subinterval I of [0, 1] 
let PCI) be the length of the interval. Then (0, 2, P) is a probability space, 
and the df of rv X(w) =, o €Q, is given by Е(х) = 0 if x< 0, 
F(x) = P{w: X(w) < x} = P(0, x) = x if x €[0, 1, and F(x) =1 
ifx zl. r 
PROBLEMS 2.3 


1. Write the df of rv X defined in Problem 2.2.1, assuming that the coin is fair. 


2. What is the df of rv Y defined in Problem 2.2.2, assuming that the die is not 
loaded? 


DISCRETE AND CONTINUOUS RANDOM VARIABLES 61 


3 Do the following functions define 475? 

(a) F(x) = 0ifx <0, = xif0 sx < 1/2, and=1lifx>}. 

(b) F(x) = (1/7) tan-!x, — со < x < оо, 

(с) F(x)-0ifxs1,and = 1 ~ (1/x) if < x. 

(d) F(x)21-— e-*ifx > 0, and = Oif x < 0. 

4. Let X be an rv with df F. 

(a) If Fis the df defined in Problem 3(a), find P(X > +), Р( < X <i). 
(b) If Fis the df defined in Problem 3(d), find P(— oo < X < 2). 


2.4 DISCRETE AND CONTINUOUS RANDOM VARIABLES 


Let X be an rv defined on some fixed, but otherwise arbitrary, probability 
space (0, 2, P), and let F be the df of X. In this book we shall restrict our- 
selves to two types of rv's, namely, the case in which the rv assumes at most 
à countable number of values, and that in which the df F is absolutely con- 
tinuous (see P.2.15). 


Definition 1. An rv X defined on (0, 5, P) is said to be of the discrete 
type, or simply discrete, if there exists a countable set E © @ such that 
P(X e E) = 1. The points of E which have positive mass are called jump 
points or points of increase of the df of X, and their probabilities are 
called jumps of the df. 


Note that E e 88 since every one-point set is in Y. Indeed, if x e 2, then 


(1) a-l irst) 


Thus (Xe E) is an event. Let X take оп the value x, with probability 
pái = 1, 2 +++). We have 

Pío: Хо) = xi) = рь i=1,2,--, p20 for alli. 
Then 27, р; = 1. 


Definition 2. Тһе collection of numbers (p;) satisfying Р(Х = ху} = p; > 0, 
for all i and У, p; = 1, iscalled the probability mass function (pmf) of rv X. 


The df F of X is given by 
(2) F(x) = P{X < x} = 2 Pi 
If Г, denotes the indicator function of the set A, we may write 
@) X(a) = Es Teh). 
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Let us define a funcion &(x) as follows: 


L x20, 
Kos ls x « 0. 
Then we have 
@ FQ) = X pex — x). 


Example 1. The simplest example is that of an rv X degenerate at c, 
Р{Х= с} = 1: 


х < с, 
х> с. 


R9 24-9 = 


Example 2. А box contains good and defective items. If an item drawn is 
good, we assign the number 1 to the drawing; otherwise, the number 0. Let 
p be the probability of drawing at random a good item. Then 


qd» 
1 P, 


. and. 
fo x«0 
uM udo А 0xx«l, 
1, 


lx. 


Example 3. Let X be an rv with pmf 


The following result is obvious. 


Theorem 1. Let {р} be à collection of nonnegative real numbers such 
that Df, p, = 1. Then {p,} is the pmf of some rv X. 


We next consider rv's associated with df’s that have no jump points. The 
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df of such rv’s is continuous. We shall restrict our attention to a special 
subclass of such rv’s. 


Definition 3. Let X ah anrv defined on(Q, ¥, P) with df F. Then X is said 
to be’ of! the:continuous type (or, simply, continuous) if F is absolutely 
continuous, that is, if there exists a nonnegative function f(x) such that for 
every real number x we have 


6) ^ “оңу yo a. 
The function f is called the probability density function (pdf) of the rv X. 


Note that f > 0 and satisfies lim, +o F(x) = А+) = f°. fit) dt = 1. 
Let a and b be any two real numbers with a < b. Then 


P(a < X < b) = F(b) — F(a) 
- f S(t) dt. 
Let B be a Borel set of the.real line. Since В can be obtained by a count- 
able number of operations of unions, intersections, and differences on inter- 
vals, the following result holds, 


Theorem 2. Let X be an rv of the continuous type with pdf f. Then for 
every Borel set B e 8 


(6) AP lh f Oa 
If F is absolutely continuous and f is continuous at x, we have 


o OP) = FD. = ду). 


Theorem 3. Еуегу nonnegative real function f that is integrable'over 4? and 


satisfies í 
E: fe) dx = 1 


is the pdf of some continuous type rv X. 


Proof. In view of Theorem 2.3.5 it suffices to show that there corre- 
sponds a df F to fs Define 


Д FG) = - [ло dt, XER. 
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Then F(— оо) = 0, F(+00) = 1, and, if x, > xj; 


1 » 1 
Fo = (f. Foro P noa = Fo. 
Finally, F is (absolutely) continuous and hence continuous from the right. 


Remark 1. Їп the discrete case, P(X = a} is the probability that X takes 
the value a. In the continuous case, f(a) is not the probability that X | 
takes the value a. Indeed, if X is of the continuous type, it assumes every 

; value with probability 0. 


‚ Theorem 4 Let X be any rv. Then 
(8) Р{Х =a} =lim P(t« X < a). 
toe 
1<а 


Proof. Lett, <t, <. <a, t,— a, and write 
: A, = (t, < X <a}. 


Then А, isa nonincreasing sequence of events which converges to 
(5.14, = (X = a}. It follows that lim, PA, = P(X = a). 


Remark 2. Since P(t < X < a} = F(a) — F(t), it follows that 
lim P{t < X < a} = P{X = a} = F(a) lim F(t) 
124 fca 
= Қа) — Ња-). 
Thus F has a jump disontinuity at a if and only if P(X = a} > 0, thatis, F 


is continuous at a if and only if P(X — a} = 0. 100 is an rv of the con- 
tinuous type, P(X = a} = 0 for all ae 2. Moreover, 


P(Xe2 — (a)) - 1. 
This justifies Remark 4 in Section 1.3. 
Example 4. Let X be an rv with df F given by (Fig. 1) 
jj ОТИХ « 0, 
F(x). = ix, 0<х<1, 
ds ех. 
Differentiating F with respect to x at continuity points of f, we get 
0, х<0огх>1 3 
X) = Е" = $ 
PSEC) n о<х<1. 


The function fis not continuous at x = 0, or at x = 1 (Fig. 2). We may 
define f(0) and f(1) in any manner. Choosing f(0) = f(1) = 0, we have 1 
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FO fe 


fe) 


Fig. 1 Fig. 2 


1, 0cx«l, 
in А otherwise. 
Then 
P{.4 < X < .6} = F{.6} — F{.4} = 2. 


Example 5. Let X have the triangular pdf (Fig. 3) 


Xs 0<х5<1, 
л 15х52, 
| 0, otherwise. 
It is easy to check that f is a pdf. For the df F of X we have (Fig: 4) 
F(x) =0 ifx <0, 
қ) = а= 2 if0<x<1, 


Fa) = [иа+[@-0й=2х-®-1 ifl<x<2 


and 
Fx)-21 . ifx > 2. 


fx) F(x) 


F(x) 


Fig. 3 Graph off. Fig. 4 Graph of F. 
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Then 


P{.3 < X < 1.5} = P(X < 1.5} - Р(Х < .3) 
.83. 


yog 


Example 6. Let k > 0 be a constant, and 
‘kx(1 — x), 0<х<1, 
fos) = [o^ 


otherwise. 


Then Ik f(x) dx = k/6. It follows that f(x) defines a pdf if k — 6. We have 
3 
Р{Х > 3} =1-6 ү x(1 — x) dx = .784. 


We conclude this discussion by emphasizing that the two types of rv's 
considered above form only a small part of the class of all rv’s. These two 
classes, however, contain practically all the random variables that arise in 
practice. We note without proof (see Chung [15], 9) that every df F can be 
decomposed into two parts according to 


(9) F(x) = aF (x) + (1 — a)F {x}. 


Here F, and F, are both df's; F; is the df of a discrete rv, while Е, is a 
continuous (not necessarily absolutely continuous) df. In fact, Р, can be 
further decomposed, but we will not go into that. (See Chung [15], 11.) 


Example 7. Let X be an rv with df 


0, x <0, 
X х= 0, 
F(x) = 
9 1 3 0<х<1, 
1, с. 
Note that the df F has a jump at x = 0 and F is continuous (in fact, abso- 


lutely continuous) in the interval (0, 1). F is the df of an rv X that is 
neither discrete nor continuous. We can write 


А) = FFs) + LE, 
where 


0, = 0, 
Es fi > 0; 
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and 
; х5 0, 
na- fs 0<х<1, 
1, 15 х. 


Here F,(x) is the df of the rv degenerate at x = 0, and F,(x) is the df with 
pdf 


ао xe T, 
= ш otherwise. 


PROBLEMS 2.4 


1. Let 
рь = р(1 — р), k=0,1,2,-, 0<р<1. 
Does {p,) define the pmf of some rv? What is the df of this rv? If X is an rv with 
pmf (p,), what is P(n € X € N}, where n, N(N > п) are positive integers? 
2. In Problem 2.3.3, find the pdf associated with the df's of parts (а), (c), and (d). 


3. Does the function f(x) = 0*x e-** if x > 0, and = 0 if x < 0, where 0 > 0, 
define a pdf? Find the df associated with f,(x); if X is an rv with pdf f(x), find 
P(X > 1). 


4. Does the function /,(х). = (x + 1)/[0(0 + 1)]e-*/* if x > 0, and = 0 otherwise, 
where 0 > 0 define a pdf? Find the corresponding df. 


5. For what values of K do the following functions define the pmf of some rv? 
(a) Дх) = K(¥/x!), x = 0, 1,2, A» 0. 

(b) Дх) = KIN, x = 1, 2, --, N. 

6. Show that the function 


Дх) = je" — oo < x < co, 


is a pdf. 
7. For the pdf f(x) = xifü < x < 1, and -2— xif 1 <x « 2, find 
P(1/6 « X x 7/4). 


8. Prove Theorems 1 and 2. 
2.5 FUNCTIONS OF A RANDOM VARIABLE 
Let X be an rv with a known distribution, and let g be a function defined 


on the real line. We seek the distribution of Y — g(X), provided that Y is 
also an rv. We first prove the following result. 
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Theorem 1. Let X be an rv defined оп (0, Z, P). Also, let g be a Borel- 
measurable function on 2. Then g(X) is also an rv. 
Proof. 

(800) < у} = (teg оо, yl), 
and since g is Borel-measurable, g !(— oo, y]is a Borel set. It follows that 
{g(X) < y) € 9, and the proof is complete. 


Theorem 2. Given an rv X with a known df, the distribution of the rv 
Y — g(X), where g is a Borel-measurable function, is determined. 

Proof. We have 

(1) P(Y < у} = P(Xeg (—o, y]. 


In what follows, we will always assume that the functions under consider- 
ation are Borel-measurable. 


Example 1. Let X be an rv with df F. Then |X|, aX + b (where a + 0 and, 
b are constants), X* (where k > 0 is an integer), and | X|“ (0 > 0) are all 
гуз. Define 


e X 2 0, 
, Х<0, 
and 
ay X « 0, 
, X>0. 
Then Х', X are also rv’s. We have 
P(IX| < y) = P(-y« X< y} = PX < y} - P(X < —y} 
= Fy) — Ң-у) + P{X=—-y}, y>0; 
P{aX + b< y = PíaX < y — bj 


he r=). it а>, 


p[x 22) if а<о;_ 
3 0 х if y «0, 
Р(Х y) = { P{X < 0) S wont & dfi y 0, 
{X¥<+PO<¥<y} if yso 

7 
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Similarly, 
К 1 if y20, 
P{X = 
ШО? rH if y «0. 

Let X be an rv of the discrete type, and A be the countable set such that 
P(Xe4A) = 1 and P(X = x) > 0 for xe A. Let = g(X) be a one-to-one 
mapping from A onto some set B. Then the inverse map, g-!, is a single- 
valued function of y. To find P(Y — y), we note that 

PLY =y} = P{8(X) = у} = Р(Х = €). yE B, 
and P{Y=y}=0, ує В“. 


Example 2. Let X be a Poisson rv with pmf 


B. 
Ap k2012-;A4-0 
PREI will jms 

0, otherwise. 


bet Y = X^ + 3. Then y = x! 3 maps А = (0,1, 2,,---}. onto 
В = {3, 4, 7, 12, 19, 28, ---}. The inverse map is x = ./(y — 3), and since 
there are no negative values in A we take the positive square root of y—3. 
We have { 


e? A73 


PLY = y) = Р{Х = pan) рдун 0 eB, 
and P(Y = у} = 0 elsewhere. 


Actually the restriction of a single-valued inverse on g is not necessary. 
If g has a finite (or even a countable) number of inverses for each y, from 
countable additivity of P we have 


PLY = у} = Р(Х) = у} = P(UIX = а, (а) =) 
= EP = a, g(a) = y). 


Example 3. Let X be an rv with pmf 
Р{Х = -2} =$, Р(Х = -1} =, P{X=0} = +, 
Р{Х = 1} = and P{X=2} =, 
Let Y = X2. Then 
A={-2,-1,0,1,2}, and B= {0,1,4}. 
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We have 
+ у= 0, 
P{Y =y} st +h=h у=1, 
3+0=5 у= 4. 


The case in which X is ап rv of the continuous type is not as simple. First 
we note that, if X is a continuous type rv and g is some Borel-measurable 
function, Y — g(X) may not be an rv of the continuous type. 


Example 4. Let X be an rv with uniform distribution on [—1, 1], that is, 
the pdf of X is f(x) = 1/2, —1 € x € 1, and = 0 elsewhere. Let Y = X*. 
Then, from Example 1, 


0, у<0, 
+ у= 0, 
Р = 
usn +1» l>y>0, 
I: y» 1. 


We see that the df of Y has a jump at y = 0 and that Y is neither discrete 
nor continuous. Note that all we require is that P(X < 0} > 0. 


Example 4 shows that we need some conditions on g to ensure that g( X) 
is also an rv of the continuous type whenever X is continuous. This will be 
the case when g is a continuous monotonic function (see P. 2.10). A suf- 
ficient condition is given in the following theorem. 


Theorem 3. Let X be an rv of the continuous type with pdf f. Let y = g(x) 
be differentiable for all x and either g'(x) 0 for all x or g'(x) < 0 for all x. 
Then Y = g(X) is also an rv of of the continuous type with pdf given by 


E d wa. 
Q) i) = ЛЕ ON GeO. a< y< g, 
0, otherwise, 
where а = min (g(— co), g(--oo)) and В = max (g(— со), g(+00)}. 
Proof. If g is differentiable for all x and g'(x) > 0 for all x, then g is con- 
tinuous and strictly increasing, the limits а, B exist (may be infinite), and 


the inverse function x = g (у) exists, is strictly increasing, and is differ- 
entiable (see P.2.10). The df of Y fora < y < B is given by 


P(Yxy)- Р(Х < g'(). 
The pdf of g is obtained on differentiation. We have 
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ку) = EPs ») 
у 
= firo jg O). 
Similarly, if g’ < 0, then g is strictly decreasing and we have 


P{Y < y} = Р{Х > '0)) 
=1—Р{Х< к (у)}) (Xis a continuous type rv) 


so that 
ку) = AO) 570). 


Since g and g ' are both strictly decreasing, d/dy g Xy) is negative and (2) 
follows. 


Note that (see Theorem P.2.15) 


d: 1 
dy * ОЎ” аах 


ry) 


so that (2) may be rewritten as 


а <у < В. 


3 22 Лх) 

(3) h(y) А] Ж 
Remark 1. The key to computation of the induced distribution of Y = g(X) 
from the distribution of X is (1). If the conditions of Theorem 3 are satis- 
fied, we are able to identify the set (X eg (—o, y]) as (X < g (y) or 
(X > g (y), according to whether g is increasing or decreasing. In practice 
Theorem 3 is quite useful, but whenever the conditions are violated one 
should return to (1) to compute the induced distribution. This is the case, 
for example, in Examples.7 and 8 and Theorem 4 below. 


Remark 2. If the pdf f of X vanishes outside an interval [a, b] of finite 
length, we need only to assume that g is differentiable in (a, b), and either 
g'(x) > 0 or g'(x) < 0 throughout the interval. Then we take 


а = min {g(a), g(b) ^ and В = max (e(a), g(b)} 


in Theorem 3. 
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Example 5. Let y have the density f(x) = I, 0<x <1, and = 0 other- 
wise. Let Y = eX. Then X = log Y, and we have . 


ю «|2]- 1, 0 « log y « 1, 
that is, 
Ay) = кур 
0, otherwise. 
If Y = —2 log Y, then x = €? and 


MY) =|— фер? OST съ, 
{М 0<y <a, 
0, otherwise, 


Example 6, Let X be a nonnegative rv of the continuous type with pdf f, 
and let æ > 0. Let y = X*. Then 4 


3 P(X < y'/« if y > 0, 
nie s» = {i ify <0. 


The Pdf of Y is given by 
= fya |d 1а 
HO) =F" | yel 


| L yl fo, y>0, 
ys0. 


0, 
. Example 7, Let X be an TV with pdf 
Дх) = Sareea, =O <x «o. 


Let Y = X?. In this 0280, g'(x) = 2x which is > 0 for x > 0, and <0 for ^ 
* < 0, so that the conditions of Theorem 3 are Dot satisfied. But for y»0 
FU S Ye PU ys Ys 4) 

= Қу) H- му), 
where F is the df of X. Thus the Pdf of Y is given by 


Pei [> SO * JC yj," "yug 
* y sx OQ. 


FUNCTIONS OF A RANDOM VARIABLE 73 


Thus 


1 ~y/2 
m= | э 0 <), 
0, »«s 0. 


Example 8. Let y be an rv with pdf 
а 00 cx, 


Јо) -| 


, otherwise. 


PLY < y} = Рп < y), 0<у <], 
SPS X six] Uik < x xz, 


where x, = Sin-ly and X2 = z — sin-!y. Thus 
PU s y) = ("ropas + fao dx 
MC) 
and the pdf of y is given by 
"org (AEA) dh (ch n 
2 


-izVl-— y 4 
0, otherwise, 


In Examples 7 and 8 the function У = g(x) can be written as the sum of 


0<у<1, 


Theorem 4. LetX be an IV of the continuous type with pdf f. Let У = g(x) 
be differentiable for all x, and assume that g'(x) is continuous and nonzero 
at all but a finite number of values of x. Then, for every real number >, 


(a) there exist a positive integer л = n(y) and real numbers (inverses) 
А109) xy), 20:5 x. (y) such that 


sO) = y, 210) #0, к = 1,2, ny), 


ог & 
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(b) there does not exist any x such that g(x) = y, g'(x) #0, in which 
4 case we write n(y) = 0. 
Then Y is a continuous rv with pdf given by 


id ол #00 ifn > 0, 
0 ifn = 0. 


Example 9. Let X be an rv with pdf f; and let Y = |X|. Here n(y) = 2, 
ху) = у, xy) = —y for y > 0, and 


ftf» у>о, 
My) = n ys0. 


Thus, if f(x)=1/2, —1 < x < 1, and = 0 otherwise, then 


Osysil, 
otherwise. 


M) = ts 


100) = (Џул) et, — co < x < co, then 


2 
2/2) 
hn = [y err, ууф, 
0, otherwise. 
Example 10. Let X be an rv of the continuous type with pdf f, and let 
Y = X^, where m is a positive integer. In this case g(x) = x^", 
g'(x) = 2тх?”7! >10 for x > 0 and g'(x) < 0 for x < 0. Writing п = 2m, 


we see that, for any y>0, n(y) = 2, x(y) = Зу, xy) = y^. It 
follows that 


Wy) = fino)» + fey) —L 


ny ny V^ 
1 n " > 
с ео" )  ify>o, 
0 ify <0. 


In particular, if f is the pdf given in Example 7, then 


2 y? z 
Ky) = | Ул ny Vn exp(-75-} ify >0, 
0 ify < 0. 


Remark 3. The basic formula (1) and the countable additivity of probabil- 
ity allow us to compute the distribution of Y = £(X) in some instances 
even if g has a countable number of inverses. Let A c: 2 and g map A into 
B S 4. Suppose that A can be represented as a countable union of disjoint 
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sets An k = 1, 2, ---. Then the df of Y is given by 


P{Y < y} = P(Xeg (—‹о, y]) 
= P(XeX[(g(—c, y]) n AJ) 
$ r1 
= E, P(XeA, n {в (—оо, y]. 


If the conditions of Theorem 3 are ва! ей by the restriction of g to each 
A,, we may obtain the pdf of Y on differentiating the df of Y. We remind 
the reader that term-by-term differentiation is permissible if the differen- 
tiated series is uniformly convergent (see Theorem P.2.19). 


Example 11. Let X be an rv with pdf 


06",  x»0, 

^ х<0, 
Let Y = sin X, and let sin™ y be the principal value. Then, for 0 «y < 1, 

P(sin X « y) 
=P{0<X<sin' y or (2n — 1)т —sin' y < X < 2nz + sin! y 
for all integers n > 1) 
=P{0<X< sin! у) + X3 P(Qn— Dz — sin! y « X < 2nz + зїп! y) 
p 


f(x) -{ 0>0. 


= 1 — e*in-ly p Ў [eiln Dasit y) _ еніп 1y) 
m1 


= 1 — ectin-ly (ейн sin-ly _ e-0sin™iy) bi e 


==) 


1 — e-?sin-ly + (etr sinl ces er» 


1 е-бх+0 зіп-у Ш 0-0 зіп-1у 
МИНЕ е 


1 


A similar computation сап ee, made for y<0. If follows that the pdf of Y 
is given by 


8e **(1 — e? t (1— уу V? [ebsin ty 4 ent- d] if — 1<y<0, 
Му) = 4&1 — e?) (1 — y?) V? [g-tsin ty е0 а] ifücy«l, 

0 otherwise. 
PROBLEMS 2.5 


1. Let X be a random variable with probability mass function 
Р = п (и) ру. r-o12-* 0spst. 
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Find the pmf's of the гуз (a) Y = aX + b, (b) Y = X", (c) Y= yX. 
2. Let X be an rv with pdf = 
0 if x < 0, 
f= + ifO0<x<1, El 
3 if 1 < x < œ. 


Find the pdf of the rv 1/X. 
3. Let X be a positive rv of the continuous type with pdf f(-). Find the pdf of 
the rv U = X/(1 + X). If, in particular, X has the pdf. 
0205751; 
fs E otherwise, 
what is the pdf of U? 


4. Let X be an rv with pdf f defined in Example 11. Let Y — cos X, and Z = tan X. 
Find the df's and pdf's of Y and Z. 


5. Let X be an rv with pdf 


[0e  ifx2z0, | 
49) = {о otherwise. 

Let Y = [X — 1/0}. Find the pdf of Y. 

6. A point is chosen at random on the circumference of a circle of radius r with 

center at the origin. that is, the polar angle б of the point chosen has the pdf 


ЛӨ) = 3, Delz я). 


Find the pdf of the abscissa of the point selected. 


7. For the rv X of Example 7 find the pdf of the following rv's: (a) Y, = ех, | 
(b) Y, = 2X* +1, (c) Y, = g(X), where а(х) 21 if x20, =1/2 if x=% 
and —— 1 ifx < 0. 


8. Suppose that a projectile is fired at an angle б above the earth with a velocity V. — | 
Assuming that 0 is an rv with pdf 


12 x л 
w- Beas s. 
0 otherwise, 


find the pdf of the range R of the projectile, where R — V? sin 20g, g being the — 
gravitational constant. 


9. Let X be an rv with pdf f(x) — 1/27) if 0 < x < 2z, and = 0 otherwise. 
Let Y = sin X. Find the df and pdf of Y. . 


10. Let X be an rv with pdf f(x) = 1/3 if — 1 < x < 2, and = 0 otherwise. E 
Let Y = |X|. Find the pdf of Y. 
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11. Let X be an rv with pdf f(x) = (1/20) if — 0 < х < 0, and = 0 otherwise. 
Let Y = 1/X*. Find the pdf of Y. 

12. Let X be an rv of the continuous type, and let Y = ¢(X) be defined as follows: 
(a) g(x) = 1 if x > 0, and = – 1 if x < 0. 

(b) g(x) =b if x = b, = xif |x| < b, and =— bif x < — b. 

(с) g(x) = xif |x| = b, and = 0 if |x| < b. 

Find the distribution of Y in each case. 

13. Prove Theorem 4. 


CHAPTER 3 


Moments and 
Generating Functions 


3.1 INTRODUCTION 


The study of the probability distributions of a random variable is essentially 
the study of some numerical characteristics associated with them. These 
so-called parameters of the distribution play a key role in mathematical 
Statistics. In Section 2 we introduce some of these parameters, namely, 
moments and order parameters, and investigate their properties. In Section 
3 the idea of generating functions is introduced. In particular, we study 
probability generating functions and moment generating functions. Section 
4 deals with some moment inequalities. 


3.2 MOMENTS OF A DISTRIBUTION FUNCTION 


In this section we investigate some numerical characteristics, called para- 
meters, associated with the distribution of an rv X. These parameters are 
(a) moments and their functions and (b) order parameters. We will concen- 
trate mainly on moments and their properties. 

Let X be a random variable of the discrete type with probability mass 
function р, = Р(Х = х}, k = 1, 2, ---. If 


(0). Ў bul Pe < оо, 
1 


. we say that the expected value (or the mean or the mathematical expectation) _ 


of X exists and write 


Q) w= EX = Ў хар, 
78 
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Note that the series Zg; x,p, may converge but the series У [рь 
may not. In that case we say that EX does not exist. : 


Example 1. Let X have the pmf given by 


У j 
a = 0 Sha у=». 


Then 
P |р; = Я 22 5, 
j mj 
and EX does not exist, although the series 
MAU оо шуп 2, 
Б serie eA de 
is convergent. 
If X is of the continuous type and has pdf f, we say that EX exists and 
equals { x f(x) dx, provided that 


f |x| /(х) dx «o0. 


Similar definition is given for the mean of any Borel-measurable func- 
tion A(X) of X. ‘ 

We emphasize that the condition f |x| f(x) dx <, оо must be checked 
before it can be concluded that EX exists and equals | x f(x) dx. Moreover, 
it is worthwhile to recall at this point that the integral J2.. p(x) dx exists, 
provided that the limit limg== f^. ф(х) dx exists. It is quite possible for the 
limit lim,- f% , g(x) dx to exist without the existence of (7. ф(х) dx. As 
an example consider the Cauchy pdf: 


Clearly 


lim ( X 1, ax =0. 
ae J-a T 1+x 


However, EX does not exist since the integral (1/7) f=. prac x de. 
diverges. Ў 


Remark 1. Let X(w) = Ід(о) for some Ae 5. Then EX = P(A). 


Remark 2. If we write h(X) = |X|, we see that EX exists if and only if E|X| 
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Remark 3. We say that an rv X is symmetric about a point о if 
P{X>a+x} = P{X<a-—x} for all x. 
In terms of df F of X, this means that, if 
F(a — x)= 1 — F(a + х) + P{X =a + x} 


holds for all xe 2, we say that the df F (or the rv X) is symmetric with a 
as the center of symmetry. If а = 0, then for every x 


F( — x) = 1 — F(x) + Р{Х = x}. 


In particular, if X is an rv of the continuous type, Х is symmetric with 
center a if and only if the pdf f of X satisfies 


Қа - х) = а + х) ога х. 


Ка = 0, we will say simply that X is symmetric (ог that F is symmetric). 

As an immediate consequence of this definition we see that, if X is 
symmetric with a as the center of symmetry and E|X| < co, then EX = a. 
Examples of symmetric df's are easy to construct, and we will encounter 
many such distributions in this book. 


Remark 4. If a and b are constants and X is an rv with E|X.| < co, 
then E|aX + b| < co and E(aX + b} = aEX + b. In particular, E(X — u} 
= 0, a fact that should not come as a surprise. 


Remark 5. If X is bounded, that is, P([X| < M) = 1, 0 < M < co, then 
EX exists. 


Remark 6. If P{X > 0) = 1, and EX exists, then EX > 0. 


Theorem 1. Let X be an rv, and g be a Borel-measurable function on 2. 
* Let Y = g(X). Then 


© EY = X g(x) P{X = x) 


in the sense that, if either side of (3) exists, so does the other, and then the 
two are equal. И? i 


Remark 7. Let X be a discrete rv. Then Theorem 1 says that 
"Ssx)P(Xx)- Ў x 
E 9) PU = х) = X yPÜY = ys} 


| 
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in the sense that, if either of the two series converges absolutely, so does 

the other, and the two sums are equal. If Y is of the continuous type with 

pdf f, let A(y) be the pdf of Y = g(X). Then, according to Theorem, 1, 
[голов = fx Wy a» 


provided that E|g(X)| < оо. 


“ 


Proof of Theorem 1. Let X be discrete, and suppose that P(X e A} = 1. 
If y = g(x) is a one-to-one mapping of A onto some set B, then 


P(Y-y)-P(X-g'(p, yes. 
We have 
E glx) P{X =x} = Xy P(Y = y). 
x€A yeB 


If X is of the continuous type with pdf f, and g satisfies the conditions of 
Theorem 2.5.3, then 


f se ax = Prt ton | 4 soe 
by changing the variable to y = g(x). Thus 


f «дю dx = nona. 


In the general case, including the сазе where g may not be one-to-one, we. 
refer the reader to Loéve [71], page 166. M us 


The functions h(x) = x", where n is a positive integer, апа A(x) = HS ` 
where а is a positive real number, are of special importance. If EX” exists 
for some positive integer n, we call EX” the nth moment of (the distribution 
function of) X about the origin. If Е|Х |“ < co for some positive real number 
а, we call E|X|* the ath'absolute moment of X. We shall use the following 
notation: , 


(4) т, = EX", By = E|X|% ч з эш 


whenever the expectations exist. { m 


Example 2. Let X have the uniform distribution on the first N natural 
numbers, that is, let jg í ) 


PK = B - s kel 2s M 


— 
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Clearly moments of all order exist: 


EX - Šk: vus N+1 
r= 


N 2 
2, 4g, 1 _ (N+1)QN+41) 
| EX BOUE езек. 
Example 3. Let X be an rv with pdf 
, 1, 
лө) ou 
0, x«l 
Then 
Me? 
ЕХ = \ —-dx=2 
x 
But 
ЕХ? fie 
1% 


docs not exist. Indeed, it is easily possible to construct examples of random 
variables for which all moments of a specified order exist but no higher- 
order moments do. 


Example 4. Two players, A and B, play a coin-tossing game. A gives B one 
dollar if a head turns up; otherwise, B pays A one dollar. If the probability 
that the coin shows a head is р, find the expected gain of А. 

Let X denote the gain of A. Then 


P(X = 1} = (Tal) =1-p, PIX = —1} =p 
and d 


« 


20. ifandonlyifp <4, 


EX=1-—p-p=1-2, 
' Aue 18 if and only if p = 4. 
Thus EX = 0 if and only if the coin is fair. 


Theorem 2. If the moment of order t exists for an гу X, moments of order 
0 < з < texist. 
Proof. Let X be of the continuous type with pdf f. We have 
E|x} = f ро) dx + f |} (x) dx 
11851 


1х1#>1 
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< Р{Х| s 1} + |х| < eo. 
A similar proof can be given when X is а discrete гу. 


Theorem 3. Let X be an rv ona probability space (0, S, P). Let E|X|* < co 
for some k > 0. Then 


mPUQX[-n-»9* as n> o. 


Proof. We provide the proof for the case in which X is of the continuous 
type with density f. We have ; 


=> fraa lim ff dsl s(x) а 
Ixisn 
It follows that 
lim \x/f@)ydx 0° as nso. 


n= 


Ixl>n 
But 


Ix] FG) ах > n* Р{|Х| > п}, 
Ixl>n 
completing the proof. 
Remark 8. Probabilities of the type P(|X| > n) or either of its components, 
P{X > п) or P(X < — n), are called tail probabilities. "The result of 
Theorem 3, therefore, gives the rate at which P(|X| > п} converges to 0 as 
п со. 
Remark 9. The converse of Theorem 3 does not hold in general, that is, 
mP(X|»»)-0 аѕп о for some k 


does not necessarily imply that E|X |. < ©, for consider the rv 


P(X2n]-—.—, п=2,3,..., 
{ ) n log п 1 
Where c is a constant determined from 
S с 
= 1. 
2, п? logn 


We have 
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P{X > п) = e li ge ud (log п)! 
oe log x 


and nP{X > n) > 0 as n> oo. (Here and subsequently æ means that the 
ratio of two sides — 1 as n — co.) But 


ý 


EX = — = = o. 


n A n 
In fact, we need 
m P{|X|>n}>0 as n9 


forsome 0 > 0 to ensure that E|X|* < оо. A condition such as this is 
called a moment condition. 


For the proof we need the following lemma. 
Lemma 1. Let X be a nonnegative rv with distribution function F. Then 
(5) Eve f [L — F(3)] dx, 


in the sense that, if either side exists, so does the other and the two are 
equal. 


Proof. If X is of the continuous type with density f and EX < co, then 
EX = fx f(x) dx = lim f" x f(a) ax. 
On integration by parts we obtain 
|| x fx) dx =n Fin) — IN F(x) dx 
= n[1— F) + [п = Fe) dx. 

But ч 

п[1 Еи) = nf fc) dx 

< f JŒ) dx, 


and, since E|X| < оо, it follows that 
п[1— Е(п)] > 0 as n> o. 
We have 
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i of 
EX - lim fx fe) dx = lim f [l — Р(х] dx 
н, [п © idx. 
If fy [1 — Е(х)] dx < оо, then 
f x fx) dx < f П — F(x) dx < fau — Е()] dx, 
and it follows that E|X| « co. 
If, on the other hand, X is:a discrete rv, let us write P(X = xj) = ру. 
Then 
EX — Axe 
Let I = fẹ [1 — F(x)] dx. Then 
k/n 
I= P{X dx, 
Ё O-1)/n i heat 
and since P{X > x} is Воп АНЫ we have 


IE P{x> 5) 51515 Р{х >41) 


for every positive integer n. Let 


2 ls к] nA Р К) 
ше Р{х> - ) and Р(х). 
We have 
=LF Fpl jl 
Lge Bet ees } 
-lE«-pr(E—Lex«k) 
Dt 
on rearranging the series. Thus 
о, Je k-1 kl. $l k-1 k 
L= EE pf «xs +} Ei» «xs 
“оК 1 L) 
eps dus -1 ply> 1 
Ex (-D)/n9xjSk/n i n P ч п 
со 1 , 
2N сунса ын Tm 
= ЕУ А 
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Similarly we can show that for every positive integer л 
1 


U,< EX 4 1. 
n 


Thus we have 


I 


EX- Ls fü - Fede < Ex + L 
n 5 n 


for every positive integer n. Taking the limit as n + oo, we see that 


EX = f [1 — FQ9] dx. 


-Corollary. For апу rv X, E|X| < оо if and only if the integrals 
[-SP(X < x} dx and f? P(X > x} dx both converge, and in that case 


EX - fre > x} dx — fps x} dx. 


Actually we can get a little more out of Lemma 1 than the above corollary. 
Tn fact, 


Е|х| = [rarr > х} dx =a fe P(|X| » x) dx, 
and we see that an rv_X possesses an absolute moment of order a > 0 if and 
and only if |x|-1 P(|X| > x} is integrable over (0, со). 


A simple application of the integral test (see Apostol [3], 361) leads to 
the following moments lemma. 


Lemma 2. 
(6) E|X|* < © = Ñ Р{|Х] > n) < ос. 


In Section 6.4 we will construct another proof of (6). Note that an 1mme- 
diate consequence of Lemma 2 is Theorem 3. We are now ready to prove 
the following result. 


Theorem 4. Let X be an rv with a distribution satisfying п“ Р{|Х |»520 
as n — oo for some a > 0. Then E|X|°< cofor0 < B «a... + 


Proof. Given є > 0, we can choose an N = N(e) such that 


(|х| > п) «—— forall n>N. 
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It follows that for 0 « 8 «.a 
E\x|? = ag 3?! PIX] > x} dx + af. x? PIJE] > x) dx. 
E м ege f Pt dy 
< o. 


Remark 10. Using Theorems 3 and 4, we demonstrate the existence of 
random variables for which moments of any order do not exist, that is, for 
which E|X |“ = co for every а > 0. For such an zv z* P(|X | m)  0.as 
п — oo for any a> 0. Consider, for example, the rv X with pdf 


сүрүе for |x|>e 
f(x) = (зне (log [х]? й 
0 otherwise. 
The df of X is given by 
1 : 
21g [x] if х= -е 
ғо)=4 d if-eex«e 
1 ^ 
P Zio if xze. 
Then for x > e 
P(|X| > x} = 


1 — F(x) + F(-x) 
1 t 
m 2logx' 

and x” P([Y| > x} — co as.x — œ for any a > 0. If follows 
that Е|Х |" =" for every a » 0. In’ this example we sce that 
P(|X| > cxj/P(|X| > x} — Las x 0 for every с > 0. A positive function 
L( - ) defined on (0, оо) is said to be a function of slow variation if and only 
if L(cx)/L(x) > 1 as x > oo for every c > 0. For such a function x^ L(x) со 
for every a > 0 (see Feller [29], 275-279). It follows that, if P(|X| | > х} is 
slowly varying, E|X|*'= оо for every а > 0. Functions of slow variation 
play an important role in the theory of probability. We again Sx thc 
reader to Feller [29]. 


Random variables for which P(|Y| > x} is slowly varying are аду 
excluded from the domain of the following result. 


Theorem 5. Let X be an rv satisfying 


Р{|Х| > ak} х 1: 
(7) Thurs. as Кә о forall а> 1; 
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then X possesses moments of all orders. (Note that, if œ = 1, the limit in 
(7) is 1, whereas if а < 1, the limit will not go to 0 since P{|X| > ak} > 
{Х| > к} 


Proof. Let € > 0 (we will choose e later), choose Ko so large that 


(8) Fe <e forall k> Ky 


and о К, so large that 
9) P{|\X|>k}<e forall к> К. 

Let N = max (Ko, Ку). We have, for a fixed positive integer г, 
P{|X|.> ak} _ P(|X| > а?) eu 
P[x|»59 = Pix] > a К} 

for k > N. Thus for k > N we have, in view of (9), 
(1) Р{|Х| > a'k} < е. 


(10) 


Next note that, for any fixed positive integer n, 
|х| = n fex ex] > x) dx 
СРЕ um 
(12) = n fx"! Pix] > x} dx + n fox! pip] > x} dx. 


Since the first integral in (12) is finite, we need only show that the second 
integral is also finite. We have 


2-1 E en 3-4 
{> 7 P{|X| > x} dx = pi ne: P(|X| > x) dx 
5 D (NY E. 2N 
т=1 
= 2N" у, (ea^Y 
E r-l 


га" 
= 2N" Tos: < ©, 


provided that we choose є such that ea" < 1. It follows that E|X|" < © 
for n = 1,2, ---. Actually we have shown that (7) implies E|X|? < оо for 
alld > 0. 


Theorem 6. If №, hg, «+, h, are Borel-measurable functions of an rv X and 
ЕҺ(Х) exists for i = 1, 2, ---, n, then E{ X7 h,(X)} exists and equals 55 , 
Eh(X). у 
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Proof. - The proof is simple. 
Definition 1. Let k be a positive integer, and c be a constant. If E(X — c) 
exists, we call it the moment of order k about the point c. If we take 


c= EX = p, which exists since E|X|* = oo, we call E(X — u)" the central mo- 
ment of order k or the moment of order k about the mean. We shall write 


My = E(X — д“. 


If we know ту, тә, +++, m,, We can compute д; д2, ***, He and conversely. 
We have 


03) py = E(X = ht = m — (Dim + (5) ema - С А 
and 
09 m =E X-u ht = yt (unit (шш + +. 
The case k = 2 is of special importance. 
Definition 2. If EX? exists, we call E(X — p}*the variance of Y, and we 
write c^ = var (X) = E(X — u)^. The quantity о. is called the standard 
deviation (SD) of X. 

From Theorem 6 we see that 
(15) 0 = ш = ЕХ? — (EX: 
Variance has some important properties. 
Theorem 7. Var (к ) = 0 if and only if X is degenerate. 
Theorem 8. Var (X) < E(X — с)? for any c # EX. 
Proof. We havé 

var (X) = E(X — uy? = E(X — cy. + (c — py. 
Note that : 
var (aX + b) = а? var(X). ^ 

Let E|X|? < co. Then we define 2 


90 GENERATING FUNCTIONS 


AU EXS Y 


ae = ax) о 
and see that EZ = 0 and var (Z) = 1. We call Z a standardized rv. 
Example 5. Let X be an rv with binomial pmf 
P{X = k} = (Ра SQ) ih = 0; 1, 2), 1s. 0 <р <A. 
Then 
EX = È k (D) ра py 
[— \k, 
= (Т 1) pa — ру» 
= пр; 
EX? = E(X(X — 1) + X) 
= EK = 1) (R) pha — p^ np 


= n(n — 1) p? + np; 
var (X) = n(n — 1) р? + np — пёр? 
= np(l — р); 


ЕХ? = E((X — 1) (X — 2) + 3X(X — 1) + X} 
= n(n — 1) (n 2) p! + 3n(n — 1) p? + np; 


йз = ту — Зит; + 2р? 
= n(1—1) (n—2) p?+3n(n— 1) p? + np —3np [n(n— 1) P^ -- np] d-2n*p* 
= пр(1 — p) (1 — 2p). 


We have seen that for some distributions 'even the mean does not exist. 
We next consider some parameters, called order parameters, which always 
exist. 


Definition 3. А number x satisfying 
(17) P(Xxx)2p P{X>x}>1-p, 0<р<1, 


is called a quantile of order p [or (100p)th percentile] for the rv X (or for 
the df F of X). We write 3X) for a quantile of order р for the rv Y. 


If x is a quantile of order p for an rv X with df F, then 
(18) PS F(x) sp + P{X = х). 
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If P(X = x} = 0, as is the case—in particular, if X is of the continuous 
type— a quantile of order p is a solution of the equation 
(19) F(x) = 


If F is strictly increasing, (19) has a unique solution. Otherwise there may 
be many (even uncountably many) solutions of (19), each one of which is 
then called a quantile of order p. 


Definition 4. Let X be an rv with df F. A number x satisfying 
(20) FS Fas Zt Р{Х=х) 

or, equivalently, 

Q1) P(Xsx)m] and Р{Х>х}> + 
is called a median of X(or F). 


Again we note that there may be many values that satisfy (20) or (21). 
Thus a median is not necessarily unique. 

If Fis a symmetric df, the center of symmetry is clearly the median of the 
df F. The median is an important centering constant especially in cases where 
the mean of the distribution does not exist. 


Example 6. Let X be an rv with Loucks pdf 


fo) = illa. -0 <х < о. 


Then E|X| is not finite. The median of the rv X is clearly x — 0. 
Example 7. Let X be an ry with pmf 
P(Y--2)-P(X-0)- 4 P=},  P(X-2)-1. 
Then 
P(Xx0)-1 and  P(X20]-i»1 

In fact, if x is any number such that 0 « x < 1, then 

P{X<x} = P{X=—2} + P{X=0} = Y 
and 

P{X>x} = P(X-1) + P(X-2) = 4, 
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and it follows that every x, 0 < x < 1, is a median of the rv Х. 
If p = .2, the quantile of order pis x = — 2, since 


P(X«-2)- L»p ad Р{Х>—2}=1>1—р. 


PROBLEMS 3.2 


1. Find the expected number of throws of a fair die until a 6 is obtained. 


2. From a box containing N identical tickets numbered 1 through N, n tickets 
are drawn with replacement. Let X be the largest number drawn. Find EX. 


3. Let X be an rv with pdf 
Л) quise — co < x < oo, т> 1, 
where с = Г(т)/[Г(1/2)Г(т— 1/2)]. Show that EX” exists if and only if 2r<2m—1. 
What is EX? if 2r < 2m — 1? 
4. Let X be an rv with pdf 
kak 


fo) = [ecce Mn 
0 otherwise (a > 0), 


Show that E|\x|" < со for a < К. Find the quantile of order P for the rv X. 


5. Let X bean rv such that E|X| « co. Show that E|X-c| is minimized if we choose 
€ equal to the median of the distribution of X. 


6. Pareto's distribution with parameters æ and B (both a and B positive) is defined 
by the pdf d 


B i 
fe [s if xza, 
10 if x <a. 


Show that the moment of order п exists if and only if n < В. Let 8 > 2. Find the 
mean and the variance of the distribution, 


7. For an rv X with pdf 


ix if O<x<1, 
mihi if l<x <2, 
18-») if 2<x <3, 


show that moments of all order exist. Find the mean and the variance of X. 
8. For the pmf of Example 5 show that 

EX‘ = np + Tn(n — 1)p? + 6n(n — Dn > 2)p* + hn — IXn — 2Xn — 3)p4 
and 
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pa = 3(npa)? + пра(1 — бра), 
where 0 < p < 1, q = 1 — P. 
9. For the Poisson rv X with pmf 


ix 


PX-x eld xeQ 1 Qos 
show that EX —à, EX? = A + 2%, ЕХ? = А.Ф 312 + P, EX* А+ TR + 63% + Ж. 
and м; = з= À pa = A + 3X. 
10. For any rv X with E|X|* < co define 


MC. Dy 
A Nam ТИЕ 


Here а; is known as ће coefficient of skewness andis sometimes used as a measure 

of asymmetry, and a, is known as kurtosis and. is used to measure the peakedness 

(‘flatness of the top") of a distribution. 3 1 
Compute a, and a, for the pmf's of Problems 8 and 9, 


11. Fora positive rv X define the negative moment of order п by ЕХ-", where 
nan 0 іѕ an integer. Find Е {1 / (X+ 1)) for the pmf's of Example 5 and Problem’ 
9, 


12. Prove Theorem 6. 
13. Prove Theorem 7. 


3.3 GENERATING FUNCTIONS 


| 
In this section we consider some functions that generate probabilities or 
moments of an rv. The simplest type of generating function in probability 
theoty is the one associated with integer-valued rv’s. Let X be an ry, and let 


LEPINE hk = Ole 
with Eg, p, = l. 
Definition 1. The function defined by 
TOSS 
a) PG —En 


which surely converges for || < 1,15 called the probability generating function 
(pgf) of X. и і 


Example 1. Consider the Poisson rv 


L [i k 
JC o tal Ba не 


94 GENERATING FUNCTIONS. 
We have 


P(s) = E (sa) 5А = ee! luni» |s| < 1. 
Example 2. Let Y be an rv with geometric distribution, that is, let 
P{X=k} = pq', k=0, 1, 2, à; 0<р<1,д=1-р, 
Тһеп 


VAE ES AA is 1 
Рб) = X s'pa парта [| < 1. 


Remark 1... Since P(1) = 1, series ( 1) is uniformly and absolutely convergent 
in/s| < 1 and the psf P is a continuous function of s. It determines the 


pef uniquely, since P(s) can be represented in a unique manner as a power 
Series, 


Remark 2. The moments of the ry X, if they exist, can be determined by 
the derivative at the point s = 1 of the function P(s). Thus 


Ps) = E Круз”, so that P(1) = ЕХ if EX < о. 


Р'(зу= E kk- Ips", so that P"(1) = Е{Х(Х—1)} if EY? < ©, 


and so оп. 


Example 3. In Example 1 we found that P(s) = „—%а-» 


В| € 1, for a 
Poisson rv. Thus 


P'(s) 5 Jegan 
P's) = Re~i- 


- Also, EX =, E(yi. X} = X, so that var (X) = EX? - (Exp = 


+ЖА- 2-4. 
In Example 2 we computed P(s) = pj(1 — 54), so that 
Е 2 
Ps) = Pq and P'(s) = 2pq 
тег аер 
Thus 
2 2 
EX = 9, күз? 4 , 2pq* аас 
Р = MS t var (X) Р: + p 7 
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Definition 2. Let X be an rv defined on (0, ¥, P). The function 
(2) M(s) = Ee* 


is known as the moment generating function (mgf) of the rv X if the 
expectation on the right side of (2) exists in some neighborhood of the 
origin. 


Example 4. Let X have the pmf 
6 1 
Бота k 21,2, 
ХЮ = | Cir 2 
0, otherwise. 


Then (1/22) DF, e*|K^, is infinite for every s > 0. We see that the mgf of 
X does not exist. In fact, EX — oo. 


Example 5. Let X have the pdf 
à | 4 e^, x > 0j 


FG) = 0, otherwise. 
Then 
M(s) = fee de 
0 
Nu 1 
Poe on 0 
Example 6. Let X have pmf 
iA 
4. k20L2-; 
Р{Х =k} 4 К! 2, 
0, otherwise. 


LO ESX Loa y gk M 
M(s) = Ee’ = емде + 


=e  foralls. 
The following result will be quite useful in what follows. 


Theorem 1. The mgf uniquely determines a df and, conversely, if ‘the mgf 
exists, it is unique. › 


For the proof we refer the reader to Widder [137], page 460, or Curtiss 
[20]. See also P.2.14. Theorem 2 explains why we call M(s) an mgf. 


` 
\ 
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Theorem 2. If the mgf M(s) of an rv X exists for s in. (— so 5) say, s >: 0; | 
the derivatives of all order exist at s — 0 and can be evaluated under the 
integral sign, that is, 


(3) M'"(s)|,-0 = EX* for positive integral k, .. 


For the proof of Theorem 2 we refer to Widder [137], pages 446-447. See 
also P.2.14 and Problem 9. 


Remark 3. Alternatively, if the mgf M(s) exists for Sin(— sp 50) say, so > 0, 
one can express M(s) (uniquely) in a Maclaurin series expansion: 


Фа M(s) = M(0) + MO уз. AO sy б, 
so that ЕХ* is the coefficient of s*/k! in expansion (4). 


Example 7. Let X be an rv with pdf F(x) = (1/2877, x > 0. From 
Example 5, M(s) = I/(1 — 2s) for s < 1/2. Thus 


2 4.2 1 
M'(s) = — 4 d Ms) = sone Эў) А. 
(5) (= 25 an (s) О 25° es) 
It follows that 

ЕХ = 2, "EX =8, and var (X) = 4. 


Example 8. Let X be an rv with pdf f(x) = 1,0 < x S 1, and = 0 other- 
wise. Then 


M(s) -f & dx = ©. E 1 ‚сај 


Mi) = ё e 20780 


= M\(0)'= lim 3€ — € 1, 1 
EX = MYO) ыды. 


Remark 4. Since there exist rv's for which the mgf may not exist, its utility 
is somewhat limited: Tt is much more convenient to work with the character- 
istic function of an rv X, which is defined as E(e'^), where i= 4/ (— 1), the 
imaginary unit, and 1 is any real number. E(e*) exists for every distribution. 
Moreover, it uniquely determines the distribution of rv X. Since we do not 
assume a knowledge of complex variables in this book, we will deal with only 
mgf's Whenever they exist. 
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We next consider the problem of characterizing a distribution from its 
moments. Let X be an ry with mgf M(s). Since n!e'*! > |sx|"forn > 0, m 
integral, we see that Е|Х l < оо for any n. Given the mgf M(s), we can : 
determine EX" for any n (positive integer) with the help of Theorem 2. 
Suppose now that moments of all orders exist for an rv X. It does not follow. 
that the mgf exists. 

Example 9. Let X be an rv with pdf 
/(х) = се“,  O0<a<l, -w<x<a, 


where с is a constant determined from 
со, 

ef e" dx = 1. 
ә 


Let s > 0. Then 


о.ж -1 ) 
ee dx =) exs—- bdx | 
0 0 


and since æ — 1 <0, f? e*e * dx iš not finite for any s > 0. Hence the 
mgf does not exist. But 


E|x|" = 2 |" ent ax = 2 [7 е7 dx < c) ^ for each n, 
as is easily checked by substituting y = x^. 


Theorem 3. Let (m,) be the moment sequence of an rv X. If the series 


sm 
® йт” 


converges absolutely for some s > 0, then {m,} uniquely determines the 
df F of X. 


The proof of this result is much too complicated to be included here, and 
we refer the reader to original papers by Hamburger [46]. It should be noted 
that condition (5) is not necessary (see Dharmadhikari [25]). 

In particular if for some constant c 


Ilse k-452-., 
then 
phun by Ж <e”  fors»0, 
1 


and the df of X is uniquely determined. 
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Tf mot all the moments of an rv X exist, «here is no chance of determining 
tthe df of X. The df of X is surely not determined uniquely by the moments 
that do exist. ‘ 


Example 10. Let X be an rv with pmf 


9r 2 
EC gs MEL 21,2, 
| 2) ы" 
$3 : esce 
vmi de and EX тд T = со. 
Let Y be ап rv with pmf 
1 351 2 
P(Yr-0)-4, «d P{y= м )- ge ksl 2e 
Then 
S2 
EY Eu and EX = EY. 
But 
4 со gni 
Y? = a 
E 2 m ©, 


and X and Ү do not have the same distribution. 
Finally we mention some sufficient conditions for a moment sequence to 
determine a unique df. 
(i) The range of the rv is finite. 


Gi) (Carleman) TEM (то) 1% = со when the range of the rv is 
(— о, ©). Е the range is (0, оо), а sufficient condition is 


Eam) !^* = оо. 
Gi) Tima, {(т„)!2"/2п} is finite 
PROBLEMS 3.3 
1. Find the pef of the rv's with the following pmf's: 


@ P(X =H) -(2)pa — yt, k= 0,12, торс. 

©) РЦ = 4) = [e(l – e] QE), k 21,2, +52 > 0. 

© PR=H = pel – дуч), k= 0, 1, 2, -- N;0 <p < lq = 1- p. 

2. Let X be an integer-valued rv with pef P(s). Let a and b be nonnegative 
integers, and write Y = aX + b. Find the pgf of Y. \ 
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* 
3. Let X be an integer-valued rv with pgf Р(з), and suppose that the mg? M(s) i 
exists for s€(— 59, 55), 5, > 0. How are M(s) and P(s) related? Using 
М%(5)|, 0 = EX * for positive integral k, find EX* in terms of the derivatives of 
P(s) for values of k = 1, 2, 3, 4. 


4. For the Cauchy pdf 
д9 = liis —oo < x < oo, 
does the mgf exist? 
5. Let X be an rv with pmf 
Р{Х = Л.= рь. ј = 0,1,2, 0 


Set P(X > Л = а„ j= 0,1,2, =. Clearly qj = Pj * Pj t s /\> 0. Write 
Qs) = 255.4 4,5". Then the series for Q(s) converges іп |s| < 1. Show that 


o9 = 0. for isi <1, 


where P(s) is the pgf of X. Find the mean and the variance of X (when they ve 
in terms of Q and its derivatives. 


6. For the pmf 


P(X-j = = 01,2, 0» 0 


where а; > 0 and f(0) = 57 ,aj0/, find the pgf and the тр in terms of f. 
. 7. For the Laplace pdf 


fx) = oy em -00 < x< 00; A>0, —со<д<оо, 
show that the mgf exists and equals 
Mi) = (1 — 269)? e^, [ale 3. 
8. For any integer-valued rv X, show that 
Es PIX < n) =(1- 9 Р), 


where P is the pgf of X. 

9. Let X be an rv with mgf M(x), which exists for £ € (— to, fo), 19» 0. Show that 
E|X|" < n! s-"[M(s) + M(— 3)] 

for any fixed s, 0 < з < fo, and for each integer л > 1. Expanding e!* inva 

power series, show that, for £ €(— s, s), 0 < s < tp 


M= 5" er, 


(Since a power series can be differentiated term by term within the interval of 
convergence, it follows that for |r| < s, 
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м2), = ЕХ 
‚ for each integer К > 1.) 
(Roy, LePage, and Moore [106]) 


3.4 SOME MOMENT INEQUALITIES 


In this section we derive some inequalities for moments of an rv. The main 
result of this section is Theorem 1 (and its corollary), which gives a bound 
for tail probability in terms of some moment of the random variable. 


Th 1. Let A(X) be a nonnegative Borel-measurable function of an rv 
X. If Eh(X) exists, then, for every e > 0, 


а) P(KX) > є) < xm. 
Proof. We prove the result when X is discrete. Let P(X = x) = рь 
k = 1, 2, --;. Then 


EWX) = У, Hap, 
= (È + 5) Морь 


where 
A = {k: Wx) > e}. 
Then ; 
EWX) > x h(x,) рь = exin 
= eP{h(X) > е): 
Corollary. Let A(X) = |X|" and e = К”, where r > 0 and К> 0. Then 


(2) Р{|Х|>К}< "E 


which is Markov's inequality. In particular, if we take A(X) = (X — p}, 
e = Ko", we get Chebychev's inequality: 


(3) P{|X - 4| > Ко) < x 
where EX = p, var (X) = 07. 


For rv's with finite second-order moments one cannot do better than the 
inequality in (3). 


r 
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Example 1. 
1 
20)21- 
P{X = 0) —r 
hae 
= 1}= 
P{X = +1} oe 


for aig il gol. 
EX=0, EX Ay em, 


K > 1, constant, 


P(|X| > Ko} = P(X| > 1) =- j 


so that equality is achieved. 


Example 2. Let X be distributed with pdf f(x) = 1 if 0 < x < 1, and = 0 
otherwise, Then ; 


ЕХ = }, EX’ =},var(X)=4+-4=% 
j pix -i[s2 3] P- 7p X419 sy} 1, 
From Chebychev’s inequality 


Pix - 4] 5274} 21-4 775 


It is possible to improve upon Chebychev's inequality, at least in some 
cases, if we assume the existence of higher-order moments. We need the 
following lemma. 


Lemma 1. Let X be an rv with EX = 0 and var (X) = c^. Then 


(4) PU» s) su, if x > 0, 
(5) PU S 3) uy у if x « 0. 


Proof. Let h(t) = (t + с), c > 0. Then A(t) > 0 for all ¢ and 
h) > (x+ с fot»x»0. 

It follows that 

(6) P(X > x) < P{K(X) > (x + oy) 


«ЖО но for all c > 0, x > 0. 
(xc 


t 
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Since EX = 0, ЕХ? = c^, and the right side of (6) is minimum when 
с = 0"/x. We have "s 
E 
P{X>x}< З х> 0. 
HII 
Similar proof holds for (5). 
Remark 1. Inequalities (4) and (5) cannot be improved (Problem 3). 
Theorem 2. Let E|X|* < co, and let EX = 0, EX? = o°. Then 
(7 P{|X|>Ko} < "TI IO for K > 1, 
where щ = ЕХ“, 


Proof. For the proof let us substitute (X^ — а?) | (Ko — 2) for X and 
take x = 1 in (4). Then 
р(х? — 22 var ((X* — PMK — д?) 
AER - = 1 + var (X? — ORF aay 
4 — et 
ШЕРТ; 


- 1) Fu- of 


Ly — 
tue iue к> 


n 


as asserted. 


nme 2i Bound (7) is better than bound (3) if. K? > j4/o* and worse if 
5 K? < mulo (Problem 5). 
E. 3. Let X have the uniform density 


tie 1 а 
fe) = {5 if0<x<1, 


otherwise. 
Then 
ЕХ = }, хаг(Х) = щ= ЕХ = =, 
апа 
Pix - 112248) s Lu i-5 
that is 


P(X-i|s2/2)2$-.9, 


SOME MOMENT INEQUALITIES юз 


which is much better than the bound given by Chebychev's — 
(Example 2). 


Theorem 3 (Lyapunov Inequality). Let 8, = E|X|" < о. Then for arbi- 
trary k, 2 < k < n, we have 
8) By? ж By 
Proof: Consider the quadratic form: 

Ou, у) = ў што || ** Ло) ах, 
where we have assumed that X is continuous with pdf f. We have 


Q(u, у) = 128,1 + 2uvBy + ыту. 
Clearly Q > 0 for all и, v real. It follows (see P. 2.4) that 


Bia Be 
Be бюл 


implying that 
Bb S Bh Ben: Н 
Thus 
Es BB. Bs BS В г< BEES 

where до = 1. Multiplying successive k — 1 Hs iae we have 

Bsp. cor- BEDS BD. 
It follows that 

B&sB^sB/ ss. 
The equality holds if and only if 
pit = giu ^ fork-1,2, AD 


that is, (8]/^) isa constant sequence of numbers, which Aic — 
if |X| is degenerate. 


PROBLEMS 3.4 
1. For the rv with pdf 


Хх; = 


where А > 0 is an integer, show that 


-ххА 
Se x>0, 
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2 А 
P(0« X < 204 + 1)) > GT 


2. Let X be any rv, and suppose that the mgf of X, M(t) = Еех, exists for every 
t > 0. Then for any ¢ > 0 


РИХ > 5° + logM()) < e-s. 
3. Construct an example to show that inequalities (4) and (5) cannot be improved. 


4. Let g(.) be a function satisfying g(x) > 0 for x > 0, g(x) increasing for x > 0, 
and g(| X|) < оо, Show that 


Р(|Х| > =) < ary for every є > 0. 


5. Let X be an rv with EX = 0, var (X) = 9%, and EX! = l4. Let K be any 
positive real number. Show that 


1 if K? « 1, 
n ifl к< 14 
P(X| > Ко) < К? A 
Hy — ot i щ 
Mt OK gt К> шщ. 


In other words, show that bound (7) is better than bound (3) if K? > j4/o4 and 
worse if 1 < К? < julo. Construct an example to show that the last inequalities 
cannot be improved. 


6. (a) Let X be-an rv with df F, and let g bea strictly convex function on the 
range of F. Let o be a Borel-measurable function, Suppose that EX, Ey(X), and 
Eg(X)all exist, and write EX = А. Let (x) = g(u) + K(x — р) be a line of support 
for g at x = д. Also, let h(x) = g(x) — (х). Then for every ғ > 0 we have 

E 2 lg) . 

1) s ЫЫ 19691 + 18р a) ] Eh(X), 

(This inequality is due to M. Riesz; see, for example, Lukacs [76]. See P.2.3 for 
definitions of strictly convex functions and line of support.) 


^ 
(b) Derive Markov’s inequality from Riesz’s inequality. (c) Show that fora > 0 
ande>0 


P(X = u| е) Meer [Eeex — enr]; 

where M = (е-е — 1 + ae}~1, 

(Hint: Take g(x) = e**, g(x) = 0 if |x — yl е and = 1 if |x — ule; 
A(x) = et [еек — 1 а(х — uy.) 


7. For any rv X, show that 
Р(Х > 0) < inf (g():1 = 0) < 1, 
where g(t) = Ee'X, 0 < p(t) < oo. 


T. We 
+ s 
S^ Library 3 


CHAPTER 4 


Random Vectors 


41. INTRODUCTION 


In many experiments an observation is expressible, not as a single numer- 
ical quantity, but as a family of several separate numerical quantities. Thus, 
for example, if a pair of distinguishable dice is tossed, the outcome is a pair 
(x, у), where x denotes the face value on the first die, and y, the face value 
on the second die. Similarly, to record the height and weight of every person 
in a certain community we need a pair (x, y), where the components repre- 
sent; respectively, the height and the weight of a particular individual. To 
be able to describe such experiments mathematically we must study the multi- 
dimensional random variables or random vectors. 

In Section 2 we introduce the basic notions involved and study joint, mar- 
ginal, and conditional distributions. In Section 3 we examine independent 
random variables and investigate some consequences of independence. Sec- 
tions 4 and 5 deal with functions of random vectors and their induced dis- 
tributions. Sections 6 and 7 consider moments and their generating functions, 
and in Section 8 we study the functional relationship between two dependent 
random variables. 


4.2 RANDOM VECTORS 


In this section we study multidimensional rv's. Let (0, Z, P) be a fixed 
but otherwise arbitrary probability space. 


Definition 1. The collection X = (X;, X2, ---, X,) defined on (0, 5, P) into 
Ry by Ps - 


X(w) = (Xi(o), Хо), +, Xo), оє0, 
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is called an n-dimensional rv (or a random vector) if the inverse image of 
every n-dimensional interval 


I = (ох; xs э м): —0 «x < an aq; eR i=l, g, n} 
isalso in X, that is, if 
XMI) = (o: X (2) < ay, —, Xo) < atef fora; e2. 


Theorem 1. Let X,, X, ++, X; benrv’son(Q, 5, P). Then X = X, X, °°, 
X.) is an n-dimensional rv on (0, 7, P). 
Proof. Let I= (xy, x2, —, x): о < Xi Sa, i= 1,2, ++, n). Then 
О, X» — X) 81) = (o: Xo) < а, Хо) < a5, Ж) < a,) 
- f (o: Xlo) € aj) е7, 
` k=l 
as asserted. 
From now on we will restrict our attention to two-dimensional random 


variables. The discussion for the n-dimensional (n > 2) case is similar except 
when indicated. The development follows closely the one-dimensional case. 


Definition 2. The function F(-, +), defined by 
@) F(x, y) = P{X <x, Y<y} . all (x, Y) E Roy 
is known as the df of the rv (X, Y). 

Following the discussion in Section 2.3, it is easily shown that 


(1) F(x, y)is nondecreasing and continuous from the right with respect 
to each coordinate, and 


(ii) lim F(x, y) = F(+00, +00) = 1, 
lim F(x, у) = F(x, 26) = 0 for all x, 
уне 
lim F(x, y) = К(— оо, у) = 0 for all у. 


But (i) and (ii) are not sufficient conditions to make any function Е(:, -)a df. 


Example 1. Let F be a function of two variables defined by 


0, *<Oorx+y<lory<o, 
Re | otherwise. 
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Then F satisfies both (i) and Gi) above. However, F is not a df since 


P(«XsVl1«Ys1) = К(1,1)+ FG, 4- FU, 1) FG) 
=1+0—1-1=-1%0. 


Let x, < x, and y; < yz. We have 


P {xi < X < x% y1 < Y < yo} 
= P{X < x, Y < ya} + P{X < хь Y < yp} 
= P{X < xn Y < y} Р(Х < xo Y<y} 
= Е(хь уз) + Ё(х, уу) — Еб, уз) — Е(хь у) 
20 


for all pairs (х, у), (xo, Y2) with x, < xo, Y1 € Y». 


Theorem 2. A function F of-two variables is a df of some two-dimensional 
rv if and only if it satisfies the following conditions 
(i) Fis nondecreasing and right continuous with respect to both argu- 
ments, 
(ii) F(—0, y) = F(x; — оо) = O.and F(+00, +00) = 1, and 
(її) for every (x1, yi), (х, Y2) with xı < x» and y; < y; the inequality 


(2) F(X2, уз) — FG Y) + Еб 21) — Fv уз) 2 0 
holds. 

The “if” part of the theorem has already been established. The “only if” 
part will not be proved here. (See Tucker [132], 26.) 


Theorem 2 can be generalized to the n-dimensional case in the following: 
manner. 


Theorem 3. A function F(x,, x,,°-:, x,) is the joint df of some n-dimensional 
rv if and only if F is nondecreasing and continuous from the right with 
respect to all the arguments ху, x2, ---, x, and satisfies the following condi- 
tions: 


@ F(= 004) ху», 2179 Xn) = Fs — Os X3 7X4) = 07 
= F(x хә, 0) = 0, 
Е(+ оо, +00, =, +0)=1, 
and 


(i) for every (x, ху, -, x,) € #„ and all &; > O = 1, 2, +, n) the 
inequality 
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Е(х + &y X; + ên +++) Xp + E) 
= X FOX & c Xa + its Xo хар + Sip rs Xn + En) 
1 
iG) +: È F(x, + en 7 хр + Ej- Xp Xit Ery ts Xj +. буду, 
rE} 
Xp Хуу + бу c Xn + En) 
+ (= 1)" F(x, x, x,) > 0 
holds. 


We restrict ourselves here to-two-dimensional rV's of the discrete or the 
continuous type, which we now define. 


Definition 3. A two-dimensional rv (X, Y) is said to be of the discrete type 
if it takes on pairs of values belonging to a countable set of pairs A with 
probability 1. We call’ every pair (x; y;) that is assumed with positive 
probability p, a jump point of the df of (X, Y), and call Р; the jump at 
(xj, »j. 


Clearly D; Pij = 1. As for the df of (X, Y), we have 
- F(x, у) = DI 
where B = ((i, j): x; < x, Уу у}. 


Definition 4. Let (Y, Y) be an rv of the discrete type that takes on pairs 
of values (x,, yj» bs 1,2; --., and j = 1,2, .... We call 


Pi; = P{X = x, Y=y;}, і= 1,2,.--, ј= 1,2, +4 


j 


the joint probability mass function (pmf) of (X, Y). 


Example 2. А die is rolled, and a coin is tossed independently. Let Y be 
the face value on the die, and let Y — 0 ifa tailturnsupand Y = 1 if. a head 
turns up. Then 


A= {(1, 0), (А 0), - (6 0), (1, 1), (2, 1), -.., (6, D}, 
1 


Pg—h fori=1,2,-,6; j= 0. 


The df of (x, У) іѕ given by 


4 
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0, ^; x<1,-w<y<0;-w<x<0,y<0, 

b 1<x<2,0<y<l, 

$; 2<x<3,0sy<l;lsx<2,1sy, 

ds 3<x<4,0<y<1, 

ll 4<х<5,0<у<1;2<х<3,1<у, 

FO у) = 5<х<6,0<у<1, 

$, 6<x%,0<y<1;3<x<4,1<y, 

4, 4<x<i5,1l<y, 

3%, Ssx<61sy 

bof6zxlsy 

\ 

Theorem 4. A collection of nonnegative numbers (pj:i = 1, 2,-; j= 
2,---} satisfying 51%: ру = 1 is the pmf of some rv. 


Proof. The proof of Theorem 4 is easy to construct with the help of 
Theorem 2. 


Definition 5. A two-dimensional rv (X, Y) is said to be of the continuous 
type if there exists a nonnegative function f(-, -) such that for every pair 
(х, y) E 2» we have 


ө Fo» [оо eas 
where F is the df of (X, Y). The function f is called the (joint) pdf of (X, Y). 


Clearly 
Е(+ оо, +оо) = lim { f(u, у) dv du 


-[ f am a ae = 1. 


If f is continuous at (x, y), then 


9 FRG 2» qs у) 
Example 3. Let (X, Y) be an rv with joint pdf given by 
laty) 


E E ss Vonia T MI тей 


T 
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Then 


Тае) - e», O<x<m, 0«y«o, 
Fes у) = (0 otherwise. 


Theorem 5. If f is a nonnegative function satisfying (^. fe f(x, y) dx dy 
= 1, then f is the joint density function of some rv. 


Proof. For the proof define 
Fe» uf. [frm avau 


and use Theorem 2. 


Let (X, Y) be a two-dimensional rv with pmf 
pj = P(X = x, Y = yj). 


Then 

(6) È ru = Š PX = n Y= y) = PY =y) 
and : | 
o. X ру = Bee = Xa Y=y,} = P{X = хд). 
Let us write 

(8) Pi. = È pu and р.; = Ўр 


Then р. > 0 and XZ, pj. = 1, p. > 0 and D2, p., = 1, and (p... 
{p.;} represent pmf’s. 


Definition 6. The collection of numbers {p,.} is called the marginal pmf 
of X, and the collection {p.,}, the marginal pmf of Y. 


Example 4. А fair coin ‘is tossed three times. Let Х = number of heads in 
three tossings, and Y = difference, in absolute value, between number of ` 


heads and number of tails. The joint pmf of (X, Y) is given in the following 
table: 
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i xir aaa | Р{Ү = у} 
I ОЕР H А 
3 1.0 0 1 $ 

ЕВ КЕТҮ ү ке, 1 


The marginal pmf of Y is shown in the column representing row totals, and 
the marginal pmf of X, in the row representing column totals. 


If (X, Y) is an rv of the continuous type with pdf f, then 


9) лед = | л) 
апі 
ао fio) = f fe 9) ах 


satisfy f(x) = 0, 2) = 0, and f, /(х) dx = 1, f, fey) dy = 1. 
It follows that f(x) and f(y) are pdf’s. 


Definition 7. The functions /1(х) and /,(y), defined in (9) and (10), are 
called the marginal pdf of X and the marginal pdf of Y, respectively. 


Example 5. Let (X, Y) be jointly distributed .with pdf f(x, y) = 2, 
0<x<y<l,and= 0 otherwise. Then 


1 2 = 2x, 0<х<1, 
AG) = f г s 6 otherwise, 
and 
af? _ [2у, 0О<у<1, 
70) = f cu, zi t otherwise, 
are the two marginal density functions. 


Definition 8. Let (X, Y) be an rv with df F. Then the marginal df of X is 
defined by 


(11) F(x) = F(x, ©) = lim F(x, y) 


112 RANDOM VECTORS 
Ep. if (X, Y) is discrete, 
n E A(t) dt — if (X, Y) is continuous. 
A similar definition is given for the marginal df of Y. 


In general, given a df F(x;, x», ---, x,) of an n-dimensional rv (Xj, X», X,), 
one can obtain any k-dimensional (1 € k < n — 1) marginal df from it. 
Thus the marginal df of (X;,, Xin, ++, Xip) Where 1 «i «i-i xn 
is given by 
lim FG, хэ» Xa) = F(t 00, --*, + соо, +00, ++, + со, Xip 00, 7, +). 
i* iis iz, +, i 

We now consider the concept of conditional distributions. Let (X, У) bean 
tv of the discrete type with pmf ру = P(X = x, Y = yj). The marginal 
pmf’s are p;. = Ру and p.; = У, р;. Recall that, if 4, Be Y and 
PB > 0, the conditional probability of A, given B, is given by 

= РОВ) 
P{A|B} = PB) 


Take A = (X = x} = (x, y): - oo < y < oo}, and B= (Y = у} = {(x,y,); 
700 <x < ©}, and assumethat РВ = P(Y = yj) = p.; > 0. Then 4n B 
={X=x, Y= yj), and 


P(4|B) = P{X = x|Y = y) = P 


For fixed j, the function P{X=x,/Y=y,} > 0and Dra P{X=x|Y=y)} =1. 
Thus P(X = x|Y — у), for fixed j, defines a pmf. 


Definition 9. Let (X, Y) bean rv of the discrete type. If P(Y — yj > 0, 
the function 


P{X=x, Y= у} 
(12) P{X = x{¥ = yj} = TU = Xe У-у) 
{ | »j) P(Y.-yj 
for fixed j, is known as the conditional pmf of X, given Y — y;. A similar 
definition is given for P(Y = УДХ = xj, the conditional pmf of Y, given 
X =x; provided that P{X = x,} > 0. Ў 


Example 6... For the joint pmf of Example 4, we have for Y = 1 


паев 1285 
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Similarly 
р{х {у= 3 = (6 


Р{Ү= j|X = 0) = |. 
and so on. 


Next suppose that (X, Y) is an rv of the continuous type with joint 
pdf f. Since P(X = x} = 0, P(Y = y) = 0 for any x, у, the probability 
P{X < x|Y = у), or P{Y < y|X =x}, is not defined. Let & > 0, and 
suppose that P{y — € < Y<y+e}> 0. For every x and every interval 
(y — е, у + &) consider the conditional probability of the event {X < x}, 
given that Ye(y = у + e]. We have 


У _ P{Xsxy-e<¥sy te} 
PAX < ху e«Ysy e PiYe(y-5»-*e) j 
For any fixed interval (y — & У + є], the above expression defines 


the conditional df of X, given that Ye (у— в, у + ё provided that 
P{Ye(y – £ y + e) > 0. We shall be interested in the case where the limit 


lim P(X < x[Ye(y - 6 y + eB 
0+ 


exists. 


$ 


Definition 10. Тһе conditional df of an rv X, given Y = y, is defined as. 
the limit 


(13) lim P(X < xYey - e y + e 
m 


provided that the limit exists. If the limit exists, we denote it by Fxiy (| у), 
and define the conditional density function of X, given Y = y; Se (|y)s 
as a nonnegative function satisfying 


(14) Fy (|) = f уху йй forall xe2. 


For fixed y, we see that fiy (x|y) > 0 and [Zs Fay (aly) dx = 1. Thus 
fx |) is a pdf for fixed y. 

Suppose that (X, Y) is an rv of the continuous type with pdf f. At every: 
point (x, y) where. f. is continuous and the marginal pdf f,(y) > 0 and is 
continuous, we have 
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im ZX S x Yely -ey +e} 
Fey(x|y) = lim. Р{Үє(у — є,у + e 


zt Ди, v) d du 
р Жу) dv S 


Dividing numerator and denominator by 2e and passing to the limit as 
є—› 0 +, we have 


n» 7 fus y) du 
ы Fry (х|у) = F; y) 


р 


It follows that there exists а conditional pdf of X, given Y = y, that is ex- 
pressed by 


= lim +> 
m 


s хОу) = Дей, fi) > 0. 
We have thus proved the following theorem. 


Theorem 6. Let f be the pdf of an rv (X, Y) of the continuous type, and 

let f; be the marginal pdf of Y. At every point (x, y) at which f is continuous 

^ and fy) > 0 and is continuous, the conditional Pdf of X, given Y — ys 
exists and is expressed by 


ik) 9935 i x = SG, y) 
a5) f|») EU 
Note that 


е. Jte» а = p) вор, 
. 80 that 
Qo. n f. (P эль neha |" Ko) reve ds 
| Where F; is the marginal df of X. ; 


‘It is clear that similar definitions may be made for the conditional df and 


conditional pdf of the rv Y, given X = x, and an analogue of Theorem 6 
holds, = 


In’ the general case, let (X Y, =; n) be an n-dimensional rv 


of the continuous type with pdf f, Xu Xn X, Qs Xs +з, Xp). Also, let 
{а <<. <i, Л «Ja... < jj) be a subset of {1, 2, ..., n). Then 
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(17) F(xj, Xi у хх Xj Xp 
| a fits, X хс» хр Bes os Mi Bytes К) D 
cong x Zu Sk yo ts Kags Xj t Xj Qt Xe Xj Xj) әй ny. 


provided that the bor rad exceeds 0. Here fx,, s xj, Xjs ttt Xj 
is the joint marginal pdf of (X; Ху, ++, X;, X, Ху "5 Ху). The conditional 
densities are obtained in a similar ANEN 

The case in which (Xj, Xz, ---, X,) is of the discrete type is similarly 
treated. 


Example 7. For the joint pdf of Example 5 we have 


fy] æ д). = es x«y«l, 


so that the conditional pdf fy, is uniform on (x, 1). Also, 
fx») = 4, 0<х<у, 


which is uniform on (0, у). Thus 
{у>} - popes 


ezip- aj fibers 


Example 8. Let {X,} be a sequence of discrete rv's satisfying 
P(X, E xj Xi = җәсе, Хр = x bed P(X, = xi] Xn- = Xin} 


Such a sequence is called a Markovian dependent sequence. The probabilities 
P(X, = x,|X, 4 = x;, 4) are called transition probabilities. 

As an example consider a sequence of trials with a fair coin. Let 
X, = number of heads in the first n trials. If X, , = k, then X, must be 
either k or k + 1. Thus 


Р{Х, = xy | Mi = iy sey Mya а UN aim k 
[fh х, = kork +1, 
-{ otherwise. 
It follows that {X,} is a Markov sequence. In this case the conditional 
probability is independent of п. Such a sequence is said to have stationary 
transition probabilities. 
Similar considerations apply if we take (X,) to be rv's of the continuous 
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type. We leave the reader to modify the definition given above. For further 
details we refer to Feller [28], Chapter 15, and Fisz [32], Chapters 7 and 8. 


We conclude this section with a discussion of truncated distributions. 
Truncation is a very useful device in the study of probability limit theorems. 


Definition 11. Let X be ап rv on (0, ¥, P) and T € $8 such that 
Q« P(Xe T) <1. Then the conditional distribution P(X < x|X €T}, 
defined for any real x, is called the truncated distribution of X. 


If X is a discrete rv with pmf p; = P(X = x,}, i = 1, 2, ---, the truncated 
distribution of X is given by 
Pi 


i bm if x eT, 
(18) кт чоет) - Малдын -ftar і , 
0 


otherwise. 
If X is of the continuous type with pdf f, then 


d 
Рет е С^ =%»хХет) _ J. 7 ч 


enn mz 
The pdf of the truncated distribution is given by 
: DRACO xe T, 
(20) Mx) = | NS 
0, xéT. 


Example 9. Let X be an rv with standard normal pdf 


Хх) = en. 


Let Т = (—o, 0]. Then P(Xe T) = 1/2, since X is symmetric and con- 
tinuous. For the truncated pdf, we have 


2 50), -o<x<0, 
We) = (0 AD 


Truncation is specially important in cases where the df F in question does 

not have a finite mean. If X is an rv, we truncate X at some c > 0, where c 

: is finite, by replacing X by X° = X if |X| < c, and —. 0 if |X| > с. Then x° 

is X truncated at c, and all moments of X^ exist and are finite. In fact, we 
can always select c sufficiently large so that 


PIX + X} = Р{|х| > с} 
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is arbitrarily small. The distribution of X* is given by. 
P(X < x} = P(X x x| XI < с} 
(21) р OA 
Р{|Х| < с} р 
if X is a continuous ту with pdf f, and by 
ма Pho ^ xau if xy ef сс], 
c 3 j Bm 
(22) P{X = ху} = [2 ыг 


otherwise, 


if X is an rv of the discrete type with pmf p; j = 1, 2, ==. 
We have for a > 0 s 


Q3) Ех < e. 
Example. 10 Let X be an rv with Cauchy pdf 
fo-lu x —o <x «o. 


Then EX does not exist. Let c > 0 be finite, and let us truncate X at c by 
writing 

yeu (& if |X| < с, 
0 if |X| > c. 


Then 


fa 1 
СЕСЕ жЕ 
22 dan c; 
л 
The pdf of X* is given by 


1 


1 TX d 
Ko- [1839 ae dbi tere 
0 


otherwise, 
with 


EX* dx 5 0, 


el 1 f x 
Atan! e Je 1+ x? 


Ee | cr gina збіла 
B= nae J тух 4 tan`! c à 


and so on. 
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PROBLEMS 4.2 


1. Let F(x, y) = 1 if x + 2y > 1, and = 0 if x + 2y < 1. Does F define a df in 
the plane? 


2. Let Tbe a closed triangle in the plane with vertices (0, 0), (0, 4/2), and 
(V2, 4/2). Let F(x, у) denote the elementary area of the intersection of T with 
{хь x2): x; < x, x; € у). Show that F defines a df in the plane, and find its 
marginal df's. ; 

3. Let (X, Y) have the joint pdf f defined by f(x, у) = 1/2 inside the square with 
corners at the points (1, 0), (0, 1), (— 1, 0), and (0, — 1) in the (x, y) — plane, 
and = 0 otherwise. Find the marginal pdf's of X and Y and the two conditional 
pdf's. 

4 Let Дх, у, 2) = eso, x >0, y>0, 2> 0, and 0 otherwise, be the joint 
pdf of (X, Y, 7). Compute Р(Х < Y < Z} and P(X = Y < 2). 


5. Let (X, Y) have the joint pdf f(x, у) = [xy + G?/2) if 0<х < 1,0 < y« 2, 
and = 0 otherwise. Find P(Y < ЦХ < 1/2] 


16. Богабз F, Fy F,..., F, show that 


1- EU - F5) s Fo x, x) < min Fx) 
ist 15і5я 


for all real numbers Xp X» 77, X, if and only if F/s are the marginal df's of F. 
7. For the bivariate negative binomial distribution 


- ! 
BOE s Yo SERES DE Pe — p. в, 


where x, y = 0, l, 2,5, К>] is an integer, O<p, <1 


izations of the corresponding univariate distributions, 
8. For the bivariate Cauchy rv (X, Y) with pdf 


Дау) = A (ag + угуз, 799 < х < оо, —со < y « co, c> 0, 
find the marginal pdf’s of X апа Y. Find the conditional pdf of Y given X — x, 


9. For the bivariate beta rv (X, Y) with pdf 


= Гру + p + p) hot eya 
f» Toren y 


Where p; р„ p, are positive real number, find the marginal pdf's of X and Y and 
the conditional pdfs. Find also the conditional pdf of Ү/(1 — X), given X = Ж. 


10. For the bivariate gamma rv (X, Y) with pdf 
7 
foy) = a (Q*o-2e^ 0-xcyaBrso 


5.-1 
== у) x20, y20, х+ујҳ 1, 
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find the marginal pdfs of X and Y and the conditional pdf's. Also, find the condi- 
tional pdf of Y — X, given X — x, and the conditional distribution of X/Y, given 
Y =y. 

11. For the bivariate hypergeometric rv (X, Y) with pmf 


потат РАДНА гуу къ» 


where x < Np, y € Npa п – x —y & М(1 — p, — p; М, nintegers with n < М, 
and 0 < p, < 1, 0 < p, < 1sothat p; + p; < 1, find the marginal pmf's of X and 
Y and the conditional pmf's. : 
12. Let X bean rv with pdf f(x)=1 if 0€ x € 1, and = 0 otherwise. Let 
Т = {х:13 <х < 1/2}. Find the pdf of the truncated. distribution of X,)/its 
mean, and its variance. 


13. Let X be an tv with pmf 


PAX: = Хе ЛД d x Q1. 2 ead ere 


"E 
Suppose that the value x = 0 cannot be observed. pud the pmf of the truncated 
rv, its mean, and its variance. 
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We recall that the joint distribution of a random vector uniquely determines 
the marginal distributions of the component random variables, but, in gen- 
eral, knowledge of marginal distributions is not enough to determine the joint 
distribution. Indeed, it is quite possible to have an infinite collection of joint 
densities f, with given marginal densities. 


Example 1 (Gumbel [42]). Let fi, fz /з be three pdf's with corresponding 
df's Fi, Е„ Ез, and let а be a constant, |a| < 1. Define 


Solty x ху) = fin) (хә) АО) BEA Ne 
(1 + aEG) - 11059 - DEG — 1). > 


We show that f, isa pdf for each о in [—1, 1] and that the collection" of 


densities (f,; — 1 € œ < 1} has the same marginal densities Sis fo, fs. First 
note that 


FG) — 1) EG) — 11053 — 1 <1, 
so that à 
1+ aA) — 1E) — T1 DES — 1] > 0. 
Also, 


Li 
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{уло хь хз) dx, dx; dx; 
= 1 + (јола) = 11/69 x) (fao) — Пло) 
(fta = Пло) 4x) 
= 1+ affFi@) |" ШЕ)” = Е” – 1) 
= 1. 
ilt follows that f, is a density function. That fi, fo, f; are the marginal 


densities of f; follows similarly. 


In this section we deal with a very special class of distributions in which 
the marginal distributions uniquely determine the joint distribution of a ran- 
dom vector. First we consider the bivariate case. \ 

‚е F(x, y) and Р(х), F,(y), respectively, be the joint df of (X, Y) and 
the marginal df's of X and Y. 
Definition 1. We say that X and Y are independent if and only if 
а) F(x, у) = Fx) Еу) for all (x, y) e 2z. 
li 
Lemma 1. If X and Y are independent and a < c, b < d are real numbers, 
"Miet ‹ 
KO) Pla<X<ob<¥<d}=P{a<X<c} P(b «Yxd). 
Proof. The proof is left as an exercise. 
"Theorem l. (a) A necessary and sufficient condition for rv's Y, Y of the 
discrete type to be independent is that 
G). P{X = х, Y = y;} = P{X = x;} P{Y = у} 


for all pairs (ху, у). (b) Two гуѕ X and Y, of the continuous type are 
independent if and only if 


"00-4 SRY) = Лх) Aly) forall (x, yye@,, 
and where f, fi, fo, respectively, are the joint and marginal densities of X 
and Y, and fis everywhere continuous. 
i Proof. (a) Let X, Y be independent. Then from Lemma 1, letting a > cand 
b — d, we get 

| P{X = c, Y = d) = P{X =c} P(Y = d). 
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Conversely, 
F(x, у) = 5 P(X = xi Y = yj» 
where 
B= {i ix SVS y} 
Then 


F(x, у) = D P{X = х) P(Y = у 
= DID PY =y PX =x} F(x) FQ). 


х©х WSY 


The proof of part (b) is left as an exercise. 


Corollary. Let X and Y be independent rv's. Then Fixo |x) = Fy(y) for 
all y, and Fyy(x|y) = Fx(x) for all x. 


We now return to Lemma 1. Recalling. once again that every Borel set on 
the real line can be obtained bya countable number of operations of unions, 
intersections, and differences on semiclosed intervals, we can establish the 
following theorem. 


Theorem 2. X and Y are independent if and only if 

6) P(XeA, Ye Ap} = Р{Хє Ai} P(Y e 49) 

for all Borel sets A, on the x-axis and A; on the y-axis. 

Theorem 3. Let X and Y be independent rv's and f. and g be Borel-measu- 
rable functions. Then f(X) and g(Y) are also independent. 

Proof. We have 


P{f(X) < x (Ү) < у} = P(Xef (о, х], Yeg (= о, yl) 
= p{xef (o, x]) P{Yeg Coo yl} 
= P(f(X) x x} P{g(Y) < у}. 


Note that a degenerate rv is independent of any rv. 


Example 2. -Let X and Y be jointly distributed with pdf 


l- xy 
Јо У) 2 ТУ. 
өт 


|| <1, pst 
otherwise. 
Then X and Y are not independent since fi(#) = 1/2, |x| < 1, and 
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FY) = 1/2, |y| < 1, are the marginal densities of X and Y, respectively. 
However, the rv's X? and Y? are independent. Indeed, 


„2 1/2 
Р(Х <u, Y! < y) - f E 12705 Y) dx dy 


-12 

1 г? reu 
x яа 2 x» |dy 
= ц? y? 


= P(X! < u} Р{Ү? < у). 
Example 3. We return to Buffon's needle problem, discussed in Examples 
1.2.9 and 1.3.7. Suppose that the rv R, which represents the-distance from 
the center of the needle to the nearest line, is uniformly distributed on (0, Д. 
Suppose further that Ө, the angle that the needle forms with this line, is 
uniformly distributed on [0, л). If R and @ are assumed to be independent, 
the joint pdf is given by 


df nh i 
Frat, 0) = felt) 0) = HE dE Is bal sis 
0 otherwise. 


The needle will intersect the nearest line if and only if 
| Sess 
7 inO>R. 
Therefore the required probability is given by 
; 2R) рт риге 
P(sin 62 an} di f f Ле, 0) dr do 


ZEIT. 


Definition 2, A collection of rv's Xy X, ++, Х„ is said to be mutually or 
completely independent if and only if í 


(6) Fon x» sx) = ÎI EGO, — for all (x, Xm, X) 6 Rp 
Where F is the joint df of (Xis X, +, Xj, and F; (i = 1, 2, --., n) is the 
marginal df of Xi. Xy, +, X, are said to be pairwise independent if and 
only if every pair of them are independent. 


It is clear that an analogue of Theorem 1 holds, but we leave the reader 
to construct it. 


Example 4. In Example 1 we cannot write 
Joris хь хз) = Fi) (х) (хз) 


INDEPENDENT RANDOM VARIABLES 123 


except when a = 0. It follows that X;, X», and X; are not independent except 
when a = 0. 


The following result is easy to prove. 


Theorem 4. If X, X; =, X, are independent, every subcollection .Xi,, 
Xin 7 Xi, Of Xy Xn = X, is also independent. 


Remark 1. It is quite possible for rv's Xy, X», «+, X, to be pairwise indepen- 
dent without being mutually independent. Let (X, Y, Z) have the joint 
pmf defined by 

5 if (x, y, z) e {(0, 0, 0), (0, 1, 1), 


E a OE (АЧДЫ onm 
PUTA EO EDT if (х,у, 2 € (0,0, D; (0, 1, 0) 


(1, 0,0), (1, 1, D). 
Clearly X, Y, Z, are not independent (Why?). We have 


Р{Х= х, Y=y}=4,  (х,у)є (00, 0), @ D, (1, 0), (1, D}, 


P{Yey,Z=z}=4, (у, жє {(0, 0), (0, 1), (1, 0) (1, 1), 
Р{Х= х2 = 2) = }, (х, De {(0, 0), (0, 1), (1, 0), (1, D), 


Р{Х= х} = }, x20 x21 
Р{Ү= у) = }, у= 0у=1, 
P{Z = 2) =}, z=0,z=1. 


It follows that X and Y, Y and Z, and X and Z are pairwise independent. 
Definition 3. A sequence {X,} of rv's is said to be independent if for every 
n = 2,3, 4, «+ the rv's Xy, X», ---, X, are independent. 


Similarly one can speak of an independent family of rv's. 


Definition 4. We say that rv's X and Y are identically distributed if X and 
Y have the same df, that is, 

Fy(x) = Fy(x) for all ,xeZ 
where Fy and Fy are the df's of X and Y, respectively. 


Definition 5. We say that {X,} is a sequence of independent, identically 
distributed (iid) rv's with common law Y(X) if {Х„} is an independent 
sequence of rv's and the distribution of X, (n = 1, 2, ---) is the same as that 
of X- 
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According to Definition 4, X and Y are identically distributed if and only - 
if they have the same distribution. It does not follow that X = Y with proba- 
bility 1 (see Problem 8). If P {X = Y) = 1, we say that X and Y аге equi- 
valent rv’s. All Definition 4 says is that X and Y are identically distributed 
if and only if 


P(XeA)] = P(YeA) forall AEB. 


Nothing is said about the equality of events (X € A} and (Y e A}. 
Definition 6. Two random vectors (Xi, Xo, +++, X,) and (Y, Yo, +++, Y,) 
` are said to be independent if 


(7 Е(х1, X25 s Xm Vis Yo 7t Yn) 
= Ех), xo 7 Xm) Fol Vis уз, ©, Yn) 


for all (x1, x», +++, Xm Vis Ya > Vn) € may» Where F, Е, Fp are the joint 
distribution functions of (Жу, X», +++; Xm» У, Fanito ols Xa. + 
and (Y, Y», --., Y,), respectively. 


5 XS), 


Of course, the independence of (Xi, Xz, ---, X,,) and id (Yi, Y, «++, Y,) does 
not imply the independence of components X;, X>, -- ‚ X, of X or compo- 
nents Y;, Yo, =, Y, of Y. 


Example 5. Let(X, Y) bean гу with joint density function 
feo »-xü*xc-9 |Ы х1, 
and (U, V) be an rv with joint density 
n 
glu, v) = per exp {— ta 5 G7 — Apuy + A} 


EA RRE арага noi 


where |o| < 1 is a given constant. Let the joint density of (X, Y, U, V) be 
given by 


t 1 
h(x, y, u, у) = svi p! + хуб? — уу] 


ep {Say LA — 2puy + 9), 


|x| 5 |y| 1, 2o « uv < о. 


Then (X, Y) апа (0, Di are independent, but X and Y are not independent 
and neither are U and V. 


INDEPENDENT RANDOM VARIABLES 125 


Theorem 5. Let X = (Xy, Xo «s+» X,) and Y = (Yi, Yo, +, Y,) be inde- 
pendent random vectors. Then the component X; of X (j = 1,2, "~ m) and 
the component Y, of Y (К = 1, 2, +, n) are independent rv's. If h and g 
are Borel-measurable functions, A(X}, X», ·--, Xm) and g(Y;, Ya, =+, Y,) are 
independent. i 


Proof. The proof is simple and is left as an exercise. 


Remark 2. It is possible that an rv X may be independent of Y and also of 
Z, but X may not be independent of the random vector (Y, Z). See the 
example in Remark 1. 


We conclude this section by mentioning that the notion of independence 
of random vectors can be generalized to any number of random vectors. We 
leave the details to the reader. 


PROBLEMS 4.3 


1. Let A be a set of k numbers, and О be the set of all ordered samples of size n 
from A with replacement. Also, let 7 be the set of all subsets of Q, and P be a 
probability defined on 7. Let Xi, Xn +++; X, be rv's defined on (Q, ^, P) by setting 


Xais а + Gn) = а (i21,2, 7, n). P 


Show that X,, X» -+ X, are independent if and only if each sample point is equally 
likely. 


2. Let X,, X, be iid rv's with common pmf 
PX = +1) = 5. 
Write X, = XiX» Show that Xi, X» X, are pairwise independent but not inde- 


pendent. 
3. Let (X,, X» Хз) be a random vector with joint pmf 


Дх, X» хз) = 1 if (х, Xn x3) € A, 
=0 otherwise, 

where 

A = ((1, 0, 0), (0, 1, 0), (0, 0, 1), (1,1, 1). 
Are X,, X» X, independent? Are X,, X, X, pairwise independent? Are X, tX 
and X; independent? 
4. LetXandY be independent rv’s such that XY is degenerate at c = 0. Show that X and Y 
are also degenerate. 
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5. Let (0,%,P) bea probability space and A, Be 5. Define X and Y so that 


X(w) = 1,(0), ^ Y(w)- Ip(w) for all w € Q. 
Show that X and Y are independent if and only if A and B are independent. 


6. Let X and Y be independent, and write Z = X + Y. Show that, if Z possesses 
moments of all orders, so do X and Y. 


7. A set of rv's X,, Xp, +++, X, is said to be interchangeable if the joint df of 
(Xi X, +++, X,) remains unchanged under any permutation of the X,’s. Let 
Xx, X, «++, X, be a set of positive, interchangeable rv's. Then 
X + Xp t+ +X _ К 
Ep tht th ЕЕЕ <А 
(Kullback and Yasaimaibodi [64]) 

id Let X and Y be identically distributed. Construct an example to show that X 
nd Y need not be equal, that is, Р(Х = Y) need not equal 1. 


P Let X and Y be independent rv's. Show that, if X and X — Y are independent, 
X must be degenerate. 


10. Let X be a nonnegative rv with EX « co, and let T be independent of X with 
uniform distribution on [0, a]. Write Y = a[(X + Т)/а), where [x] denotes the 
greatest integer in x. Show that EY — EX. (Rowland [104]) 


11. Let X and Y be independent rv's with pdf's f, and f,, respectively, where 


: hes ifx > 0 
9) = jh otherwise " APA 


and 


= fue ify>0 
£0) {0 otherwise’ “> % 


Let U = min (X, Y} and V = 1 if U = X, and = 0 if U = Y. Find the joint dis- 
tribution of (U, V). 


12. Let (X,) be a sequence of independent rv's with pmf 

Р(Х, =1) =p, P(X--i-1-p O<p<i. 
Write S, = D7_, X,. Show that (5,) is a Markov dependent sequence. 
13. Prove Lemma 1. 


14. Ў Let Xy, X,, :--, X, be ry’s with joint pdf f, and let /; be the marginal pdf of 
Xf = 1, 2, «+, n). Show that X, X» "<, X, are independent if and only if 


fn Xa x) = Д f(x) (огай (xy, xp) =, Xp) © Rye 
15. Prove Theorem 4. 
16. Prove Theorem 5. 
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Here we first consider some simple functions of a random vector. Let X and 
Y be rv's defined on a probability space (0, 5^, P). Define 


(X + Y)(o)- X(»)--Y(») forall weg; 


(XY) (w) = X(o)Y(o) for all weQ, 
(+ 7e his XO for ай «cQ, provided that (Y = 0} = ф 


Theorem 1. Let X and Y be rv's defined on (Q, Y, P). Then so are X + Y 
and XY. If in addition (Y = 0} = ¢, then X/Y is also an rv. 


Proof. We have 
(1) {X+ ¥<z}=U{¥ <r} {¥<2-7}, 


where the union is over the set of all rational numbers r. Since the 
set of all rational numbers is countable, we see that 4 = U,(X <r} 
n (Y «z — r) e 4. We need only to show that the sets on the two sides of 
(1) are the same. We first note that set 4 c B = (X + Y < z}. Let we B. 
Then X(w) + Y(w) < 2. Let ту Бе a rational number satisfying X(w) < лу 
<z> Ү(о). Then X(w) < rọ and Y(w) < z — г, so that we (X < To) 
n (Y < z— ry) e A. It follows that B © A and that X + Y isan rv. 

Since X + Y, X — Y, are rv’s, (x YY and (X— Y) are also гу» 
(Theorem 2. 5. 1), and it follows that 


yz (К+ YF - (к-у) 
4 


is also an rv. 
Finally, since (Y = 0) = ф, we have 


Eu. (x Xx 
ек) уо) + у> о) 
={X2x¥pY¥ <0} 4+{X < xY, Y > 0} 
= {X-xY20} {Y¥ <0} + {X¥-xY¥ <0} {Y>} ef. 


Let Xi, Xz, ·-., X, be rv's on (0, 2, P). Define 
(2) M, = max(X;, X» ---, X,} 
by 
М.о) = max (Xi(o), Хо), <, X,(w)} forall weQ 
and 
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(3) N, = min (Xy Хо Xa) 
by 
Мо) = min(Xi(o), Xo), 1 X,(w)} for all weQ. 
Note that 
N, = -max {~ Xn —Xo c. — X) 


Theorem 2. M, and N, are tv's. 


Proof. It suffices to show that М, is an rv. We have 


{M, < z} = (G5 52-Х, 5 z}= (106 &zje4, 
and the proof is complete. 


Theorem 3. Let Xj, Xo, --+, X, be independent rv's. Then for all ze 2 


(4) P(M, < г} = Й Р(Х < z} 
and 

(ON P(N,€z)-1— fin = P(X; < т}. 
Proof. We have 


Р{М,< 2} = P(«z X Sz. X, < z} = fi PG 22}, 
and 
P{N, < z} =1— P{N, > z} 
=1- P{X%> з, X> 1+, X, > 2} 
= i= fio = P(X, < z}]. 


Corollary. If X, X, ---, X, are independent гуз with common distribution 
function F, then 


(6). P(M, < z} = F'(2), ze, 
and 
o P(N,«z)-1-[I-F(], ze@. 


If, in addition, F is absolutely continuous and f is the density of Xj, the 
densities of M, and N, are given, respectively, by 
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(8) g(z) = nF" (2) fz) 
and 
(9) кх) = nfl — Fa)" JG) 


at all continuity points of f. 
Theorem 4. Let Xi, Xo, ---, X, be independent rv's. Then the joint df of M, 
and N, is given by / \ 
Ti P(x < x} - fl PO х) | 
00) р{М„<х,М„<у}= „Вх, ху] ifx>» 
Й P(X < х) ifx<y. 


Proof. { 
P(M, <x М, « y) = P(M, < x} - P(M, s x, № >y} 
= Ñ P(X 5х) - PG < » % < х,- A Sx 
X,» 7X,» у}. 
Ifx<y, 
P{M, < x, N, < Y} = fi P(x < x}. 
Ifx>y, 
P{M, < x, N, < у} = Р(Х < x)- Ñ P{y < X, <x} 
=f PO ях} — Р(Х s x) - PO E X) 


Corollary. If X;, X», ---, X, are independently distributed with common df 
F, then 

"(x) — [F(x) FEON, х> у 
FQ 2: (XE 

If, in addition, F is absolutely continuous with density f, the joint density 
of M, and N, is given by 


(11) P(M, < x, N, <= tia 


E х< у, 
(2) 63 = {м — DEO- FON? SOSO x29 


Note that M, and N, are not independent. 
Example 1. Let X;, -+ X, be rv's with common density function 
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1 
уб) = (= for x e [a, b], 
0 


otherwise. 
Then the density of M, is 
я-1 
2-а 1 b 
Pee (=) n for z e [a, b] 
0 otherwise. 


The density of N, is 


M) = Ki ry for z e [a, b], 
0 otherwise. 
The joint density of M, and N, is 
if x « y, 
ux, y) = LS =p? Ка<у<х <Б. 


Example 2. Let Xi, X; be iid rv's with common probability function 
PX -i-6/5, Е 


where А > 0 is a constant, Let M = тах (Xi, X) and М = min (Xj, X). We 
have 


] x1 
P{X, = x, M = 2} ={P{X,=x,%) <2}, x-z 
М P(X'xX,—zM. x <z. 
By independence, therefore, the marginal pmf of М is given by 


P{M = m) = Р{% =x, M =m} 


-DARIE з” а X 
mA: xiu m! 5 "bs (m = 0, 1, 2, ---) 
Ba ae m-l 4 Peru 
26 nt By Te (Gy 


me 


Clearly 
Similarly 
[0 moi хайх <л, 

P{X = x, N = n} =)Р(Х = n, X > n} ifx =n, 

is {0 = х= п) іғхьп; 
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so that 
P{N=n} = È P{X = x, N = n} 
х=п 


= P{M = п,® >п} + d POS =x, % = n) 
эЛ RE „= э x 


"x n! £j! е пх! 
Edi y зо Да 
mde пг А oe DE 


n20,L2,---. 
Theorems 1.and 2 are special cases of the following general result. 


Theorem 5. Let f: 2, > 2, be a Borel-measurable function, that is, 
if B e Bp, then f-(B) e8,. If X = (Xy X» «s Xn) is an n-dimensional 
rv (n > 1), then f(X) is an m.dimensional rv. ! 


Proof. For Be38,, i > 
{fay Xn +, X) € B) = {Xi Xo X) ESB}, 


and, since (В) e B, it follows that (Xj, X2 ---, X,) ef^ (В)} € 9^, which 
concludes the proof. 


In Theorems 3 and 4 we computed the df of some simple functions of 
Xy X» +, X, by a combinatorial argument. It is possible to compute the 
distribution of rv's such as X + Y, X — Y, XY, and X/Y (where[Y = 0] = 9) 
directly. It will be convenient, however, to state a general result that will be 
of subsequent use. MS 

The case in which X = (Xy, X» ---, X,) is a random vector of the discrete 
type causes no special problem. Let А € 2, be ће. countable set of points 
such that Р{Хє AT = 1, and assume that every point in A has Tenet mass. 
Let uj = gi(Xy Xo 7. Xn), U2 = 80р Xz 7t Xn) rns Un = B Ag *, Xn) 
define а one-to-one mapping of A onto B. Then, writing w= (uj, uo, с, Uns 
we have 


P{U = u} = P{g(X) = ui (X) = uz 7, £X) = tn} 
= PÍX, = hu), Х = hu), +, X, = hu) forall ue B, 


and 
P(U-w)-0  ifu£B, 


where x, = (п), x;-— hu), =+, x, = h,(u) is the inverse transformation. 
The marginal pmf of any U; or the joint marginal pmf of any subcollection 
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of Uj, U2, ---, О, is now easily computed by summing over the remaining 
u,'S. i 


Example 3. Let X have the binomial pmf 
P(X-k)- (gre py", к= 0,1, 2,--,n;0<p<1, 


otherwise. 


Let Y be independent of X and have the same pmf as X. We wish to find 
‘the pmf of X+ Y. We have | 


P{X+ Y -z) = Å P{X =k, Y=z-k} 
- É()ra ("р-р 
- EC tore - o 
-(Dra-p :-612 4. 


Suppose that W = X — Y, then W takes on values w = — n, — (n — 1), =, 
| = 1, 0, 1, ---, п. We have 


P{W =w} = È P(X =k +} P{Y =k} 


e E(k л AA pete (1 — pe 
- (55) &( t (y'a - n^ 
where w = —n, —(n — 1), ++» —1, 0, 1, ++, n. Thus 
P{W = 0) = £ (P) pa — pc, 


MIT e ist 
and so on. 3 


Finally, let U= ХКУ + 1) and V — Y + 1. Then x = uy and y — y» — 1. 
The joint pmf of U and V is given by 


PU =u V= = (ner pee (t 1) = py 


Cte 3 prt (1 — pote 
Here V takes values 1, 2, -«,n + 1, and UV takes values 0, 1, 2, ---, n. Thus.” 


P(U — n, V= 1} = p — py, | 
254 ux E 
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ОРД 5 ЭХЕ _ (n\(n\ aei _ „үл? 
P(U е Ме) = (Qe (Die 
k =0, 1, 2, =, n; j = 0, 1, 2, n. 
Example 4. Consider the bivariate negative binomial distribution with pmf 
_ (х+у+К-1)! > k 
P(X = x, Y= у} = СЕ р-р), | 
where x, y=0, 1, 2, kz 1 is an integer; р, р; є (0, 1); and 
ру + рә < 1. Let us find the pmf of U=X+ Y. We introduce an 
rv V. = Y (see Remark 1 below) so that и=х+у,у=у represents а 
one-to-one mapping of А = ((x, y): x, y=0, 1, 2, +} onto the set 
B= {(u, у): v=0, 1, 2-,u;u-0 1, 2,-) with inverse map 
| x = и — v, y = у. It follows that the joint pmf of (U, V) is given by 
и+ к – 1)! = „> k 
u nep P (l -Pi — py 
P{U = u, V =v} = for (u, v) € B, 
0 otherwise. 


The marginal pmf of U is given by 
i k-1!ü0-p-»pyt Hy D 
р{и =u} = “+ СҮЛ л: ee P9 P2 


A ay Se k 
pede ik zt T fl P) (p, + рз)“ 


(артар 6-295273 


Next suppose that the mapping of x to и is not a one-to-one mapping of 
| A onto B, but that for each ue B there аге а finite (or even countable) 
number of inverses Xj, Xo, ***» Ху. In that case 
P(U = u} = P(g(X) = ш, g(X) = и» 5 g(X) = uj) 
= Pia) = щ, gXx;) = и, 5 EX) = Uns Х = xj). 


Example 5. Let(X, Y) bean rv of the discrete type with joint pmf as 
| follows: 


wj o ne Б | o 
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Let U = |X| and V = Y?. Then 
= {(x, y): xe {—-1, 0, 1}, ye {—2, 1, 2}} 
and ў 
= ((u у): и = 00г1 апі у= 10г4). 


The pmf of (U, V) is computed in the following table: 


(и, v) _ Inverses (x, y) P{U =u, V = у} 
(0,1) (0,1) È 
(0,4) | (0, 2), (0,—2) h 
(41) |... (-1,.1), (1, 1): i 
(1, 4). (15-2), (= 1, 2), (1, 22) (2). 4 
- The marginal pmf of U is given by 
P(U = u) -{? зж 
and that of V by 
PIV = 9 -{¢ "gi 


The case in which X = (Xj, X», ---, X,) is of the continuous type is notas 
simple. We would like to obtain an EHE of Theorem 2.5.3. To this end, 
let X be an rv with joint pdf f(x), x», ~, x,), and let U = gX, X, ·--, X) 
= (Uj, Us, `, U,), where 


Uz = Хр Xap зе, Xa) | P= LQ, +4, т 
be a mapping of 2, into itself. Then д 
PU eB} = Peg) = |. fenem na f dea 


where (B) = {x = (x, x» Xn) E Rp: im € В}. Let us choose B to be 
the n-dimensional interval 


B= B, = {(uj, us, +, ш): – оо < u; < up i = 1, 2, ---, п). 
Then the joint df of U is given by 
: P(Ue B) = G(u) = P(e(X) < из EX) S и» ++, g(X) < un} 
| = J i, ems. 
and (if G is absolutely continuous) the pdf of U is given by 


А 
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д"С(и) 
Qu, диз --: Ou, 


w(u) = 


at every continuity point u of w. Under certain conditions it is possible to 
write w in terms of f by making a change of variable in the multiple integral 
see Theorem P.2.17 for a statement of this result). З 


Theorem 6. Let (Xj, X», --, Х,) be an n-dimensional rv of the continuous 
type with pdf f(x1, xo +, X»). 
(a) Let 
Uy = giu Xa х) 
из = @ (Ху, Xo i Xn) z 


и, — gXis Xo, 77 Xn) 


be а one-to-one mapping of 4, into itself, that is, there exists. the 
inverse transformation 


х= (и, us Ups 0X = (и, иу +, Uns s 
x, = иь up 7s Mn) 


4 defined over the range of the transformation. 
(b) Assume that both the mapping and its inverse are continuous. 
(c) Assume that the partial derivatives 1 


zt. L<i<nl<jsn 


_ sexist and are continuous. 
(d) Assume that the Jacobian J of the inverse transformatión 


Qu, ди; Qu, 
Әх, OX2 дх, 
= xinon 264) bali e An n se Ж 
PS ew c uS PN nd Lou аны ди, 
Bx, Ox, Ox, 
Qu, Qu; ^" Ou, 
is different from zero for (uj, up, ·--, ua) in the range of the transfor- 


mation. 
Then the random vector (Uj, Uz :--, U„) hasajoint absolutely 
continuous df with pdf given by 


(13) W(Uy, up, +*+, ш) = Vl fena; 7 Uy ts up, ш). 
Proof. For (ш, из ^; Un) € By let 
B = {(uj, ug `=, up) E Ra: — 0 «u Sus i= 1, 2, n) 
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Then ~ 
g'(B) = {xe #„:р(х)є В} = (х, Xo - x): g(x) < un i= 1, 2, +, n} 
and > : 
Golu) = P{Ue B) = P{Keg (B) 
cis FO Be sx) Фу dia dy 


Q(uy, и» +, и,) 
Result (13) now follows on differentiation of df Gy. 


Д 
EI 


ПОО 39 д... du, 


Remark І. In actual applications we will not know the mapping from 
Xp Xo, 771 Xy tO иу, U2, +-+, и, completely, but one or more of the functions 
8, Will be known. If only k, 1 € k < n, of the g/'s are known, we introduce 
arbitrarily n — k functions such that the conditions of the theorem are 
satisfied. To find the joint marginal density of these k variables we simply 
integrate the w function over all the  — К variables that were arbitrarily 
introduced. 


Remark 2. An analogue of Theorem 2.5.4 holds, which we state without 
proof. 

Let X = (Xy, X», +, X,) be an rv of the continuous type with joint pdf 
f, and let u; = g(x, x» =, Xn) Ё = 1, 2, +; п, be a mapping of 2, into 
itself. Suppose that for each u the transformation g has a finite number 
k = k(u) of inverses. Suppose further that Ф; can be partitioned into k 
disjoint sets A), 45, +++, Am such that the transformation g from Ai = 1, 2, 
++, n) into 2, is one-to-one with inverse transformation 


Xi = s Way y uy) yy = (us ay suy) Em 1, 2, s К. 


Suppose that the first partial derivatives are continuous and that each 
Jacobian 


hu hy Әһ, 
ди ди; `“ Qu, 
Oh; аш ә 
J;-|0u ди; `|“ Ou, 
йы Әһ, ды 


is different from zero in the range of the transformation. Then the joint 
pdf of U is given by 


7 
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Wn, и» s thy) = ба, a) s t s ш). 


Example 6. Let Xj, X; Хз be iid rv's with common exponential density 
function 


(ЖЫ ifx > 0, 
I= № otherwise. 
Also, let 
X X, X 
Y x 3 E XY. yj us 1+ 22 h-y E 


POS ED s 
Then у 


X = Уузуъ X2 = Vie — Xi = YiY2(1— уз), and 
хз = V1 уу = Vill 02). 


The Jacobian of transformation is given by 


Угуз Ууз Jis ? 
J= {yl — уз) vill — уз) -A»|- -»»- 
l= yz -» 0 ` 


Note that 0 < y; « o5, 0 < y; < 1, and 0 < уз « 1. Thus the joint pdf of 
Y;, Ү Үз is given by ; 


(у, Yas Уз) = ууз ет 
= (2y2) (4 yt e, 0 < yj € 0,0 < ys, ys <1. 


It follows that Y,, Yo, and Үз are independent. 
Example 7. Let Xj, X; be independent rv’s with common density given by 


gil if 0<x<1, 
FO) = b otherwise. 


Let Y, = X; + X, Y; = X, — X». Then the Jacobian of the transformation 
is given by 


and the joint density of Ү,, Y; (Fig. 1) is given by 
Ра Jit Y2\ p( Yi — Y2 
f, y Qo 99 = 1o >) 
ifo« 22 <1,0< ny <l, 
сф =1 -ifp ye {0 <р +y <2, 00 < < 2). 
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Ў Fig. 1 
The marginal pdf’s of Y; and У, are given by 
^ У; 
fs 14у» = уь 0 yrs, 
л,00 = fuos =2-y, , e a <2, 
j з ; 
0, otherwise; 
[ot тал 1,  -1<y2<0, 
= 2—У; 4 
Bb rf ames 0 « y; <1, 
ү 0, { otherwise. 
‘Example 8. Let X;, X2, X; be iid rv's with common pdf 
| fom zee", —o «x« о. 


Let Y, = (X, 0) 2, Yo = (X + X, — 2X9] /6, and 
Yi = (X + X + XQ[ 3. Then 41 


Ye Heide SEL WML psa A Уз 
Farin eae tela) 


КЖ» a ot > 0E SAN 


Wels Н MR 2 Js 
Tb 2 uA V3 
E Man d. 
Br ob 33; 
The Jacobian of transformation is given by 
12 Al Li 
42 v6 v3 
К > rd Кт SUT Shy | | 
иңе ir + V2 6 3| - 
Hx E: >й TF 1 
43 
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The joint pdf of Ху, X2, X, is given by 


#(х Xo, хз) = C exp- (5 Er x + 5 


Xp Xo, ER, 


It is easily checked that 
AFGEE у ++ I> 
so that the joint pdf of Y,, ` Y>, Үз\1з given by 
й+»+» ) 
2 


1 
W(yp Yo Уз) = (Any epf- 
It follows that Y;, У, Ys are also iid rv's with common pdf f. 


In Example 8 the transformation used is orthogonal and is known as 


Helmert's transformation. In fact, we will show in Section 5.4 that under . . 


orthogonal transformations iid rv's with pdf f defined above are transformed 
into iid rv’s with the same pdf. 
Actually it is easily verified that 
3 А 2 
K+ Apa (ата, 
= 

We have therefore proved that (X, + X; + Хз) is independent of 
E, - [X + 0 + X3/3]Y". This is a very important result in 
mathematical statistics, and we will return to it in Section 7-5. 


Example 9. Let (X, Y) be a bivariate normal rv with joint pdf 
1 
xy) cine Coon a 
fes» 270101 —. p)^ 


1 aw а 2px = шХу— n) ү py}, 

20 - e) 4 0102 DTE i 

—-O<x< 0, -0O <y < 0; MER, MER; 
and g; > 0, 2; > 0, |p| < 1. 


Let 1 
17-05 Xx 
U-/X-Y, U- 


. exp{ 


If u; > 0, then 


x y =u and 


have two solutions: 


х ааа = d а =о—ху, ie 
1 Vit“ л Vit“ an » Gb Je л 
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for any и, є 2. The Jacobians are given by 


7 + 

1 и uy 
h= = 

л 1 1+ u, 


VItu 
It follows from the result in Remark 2 that the joint pdf of (U;, U3) 
given by 
an иш» uy 
157 "n viet) 
— миш) = ООЛ Ыы ; 
IIT Vix). fn» ® жез, 
‘otherwise. 
in the special case where шу = рг = 0, p = 0, and сү = с; = о, we have 
1°, 
L(x, у) = эр“ 1025292021 
so that X and Y are.independent. Moreover, 


І Sœ у) 2 f(-x, 5), 
and it follows that 


loma oe vi 
кенв тея" » u»0, —o <и < o, 
ЦОМ otherwise. 
Since 
ще 


` wun Күн за: 


it follows that U; and U; are independent with marginal pdf's given by 


w(u) = ra i m 


u < 0, 
and 


1 
walu) TR Ee 5и vo, 
respectively. " 


epe application of the result in ue 2 will appear in Theo- 


FUNCTIONS OF RANDOM VECTORS 141- 


Theorem 7. Let (X, Y) be an rv of the continuous type with pdf f. Let 
Z=X+Y,U=X-¥,V=XY; 


and, if P(Y = 0} = 0, let W = X/Y. Then the pdf's of Z, О, V, and W 
are, respectively, given by 


(i) fio = f fes s да 
а жш) = F^ fi v 9» dr. 
(16) ло) ра 
(17) жо) = ^ rem. х) |x| de. 


Proof. The proof is left as an exercise, 


Corollary. If X and Y are independent with pdf's f; and fz, respectively, 
then * 


(18) о) = | AONE- x) de, 
(19) жю = f^ flu + 0 ay, 
(20) Wo) = f° Sos) ur d 
(21) Suto) = f лон) л) [s] de. 


Remark 3. Let F and С be two absolutely continuous df's; then 
но) = f^ Fe» со) dy =" G6 - ») FO) dy 
is also an absolutely continuous df with pdf 


H'(x) = fey F(x = у) G(y) dy = ied G'(x — y) FQ) dy. 


F(x) = руд &x — xy) and = G(x) = 5 9; 6х – уу) 
J 7 
аге two df’s, then 


Н(х) = x L Prd; E (X — X. — Yj) 
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‘is also a df Of an rv of the discrete type. The df H is called the convolution 
of F and G, and we write H = F + G. Clearly the operation is commutative 
and associative; that is, if Е, Fy, Ез are df's, Fy + F; = F, + Е and 
(Е, * Ез) * Fs = Е * (F, * Ез). In this terminology, if X and Y are indepen- 
dent rv's with df's F and G, respectively, X+ Y has the convolution df 
H = F + С. Extension to an arbitrary number of independent rv's is obvious. 


Example 10. Let Xj, X2, ---, X, be iid rv's with common density function 
f. Also, let 


M, = max (Xy, Х +, Xn)s N, = min (Xs, X», «++, X), 
and write 
R, = М, — Ny. 


Then R, is also an rv called the range of Х\, Xz, +-+, X,. By the corollary to 
Theorem 4 the.density of R, is given by 


л = FP frag g(t + 9% 9) ay 
= (fin Qi x 4 
= n(n - 0 | Ед = Fe - ш” у 
where F is the df of X;. If, in particular, 


Ll if 0€x« 1, 
FG) = { otherwise, 


(и > 0), 


we see that 


TAE (OS: for и.< Ооги> 1, 
љо = { 1u”? (1 — u) for O<u<l. 


Theorem 8. Let X and Y be independent and identically distributed. Then 
X — Y is a symmetric гу. 


Proof. We prove the theorem for the case in which X and Y аге of the dis- 
crete type and leave the reader to supply the proof for the continuous case. 


We have 
P(X- Y< s} -EP(-Ysx-x) PU = x) 
-ZH- X< x- х, P{X = x} 
ере Х>х- xj} Р(Х = xj. 
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Therefore 
P{X-Y<-x}=1- X P{- X> =x- х) P{X = х} 
1- 5 P(X « x * xj) P(Y х} | 


1- P(X - Ү<х} 
1-P(X- Y< x) + P(X- Y- х), 


and it follows that X — Y is a symmetric rv. 


ug 


Remark 4. Ап ту X has a symmetric df if and only if X and — X have the 
same distribution. If X and Y are identically distributed but not indepen- 
dent, let us write Z — X — Y. Since X and Y are identically distributed 
and —Z = Y — X, there is a temptation to conclude that Z and — Z have 
the same distribution, so that Z is a symmetric rv. That this is not true is 
seen in the following example. 


Example 11. Let (X, Y) have the joint pmf given in the following table: 


| 
| 
| 


Here X and Y are identically distributed but not independent. The pmf of 
Z = X — Y is given by ) 


=| Бе Sh ge 


Se Sh SESE) ma 
SM о OND 
BP) SE Sr ru 


7 P{Z = 2} 


-2 P 
21 b 
0 à 
2 1 à 
2 à 


Z and —Z do not have the same distribution, and therefore Z is not sym- 
metric. 


It is easy to find examples of jointly distributed rv's X, Y such that X and 
Y are identically distributed and X — Y is symmetric. 


Example 12. Let (X, Y) be jointly distributed with density 
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(1 + а) — 20(х +y – 2ху), 0<х<1,0<у<1, 
fe» = i otherwise, 
where |o| < 1. The marginal pdf's of X and Y are the same, namely, 
1 if 0<х<1, 
8) = li otherwise. 
If we write U — X — Y, the density of U is easily shown to be 
3 
(l+4a)+(l+au-2a4%, | -l1«u&0, 


3 
Solu) = (40) - (ut 204, O<u<i, 
0, otherwise. 

Clearly U is symmetric. 


The following result gives a condition under which rv's X and Y are depen- 
dentund X — Y has a symmetric distribution. 


Theorem 9. Let (X, Y) bea jointly distributed rv such that the joint distri- 
bution is symmetric in the arguments. Then X — Y has a symmetric distri- 
bution. 


Proof. The proof is left as an exercise. ‘ 
Corollary. If X and Y are iid, X — Y has a symmetric distribution. 


It is possible to improve Theorem 9 in special cases, but we will not do so 
in detail here. The following example suffices. 
Example 13. Let (X, Y) have the joint density 
1- X E = 
дух, yes ee у), 1<х<1, l«y«l, 
otherwise. 
Then X and Y both have the same pdf: 


w= (i if -1<x<1, 


l otherwise. 


The pdf of U — X — Y is given by 
= 43. 
{+- tut Oo 1$ ~ х0 0<и<2, 


folu) = [red азу + —-2<u<0, 


0, Ip otherwise. . 
Clearly U is symmetric. 
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Definition. Let X be an rv, and let X' be an rv that is independent of X 
and has the same distribution as X. We call the rv 
з 0 Wd 
the symmetrized X. 
The technique of symmetrization is an important tool in the study of limit 
theorems. 
Remark 5. Clearly X* is symmetric about the origin so that 
P(XY':»021 and PX sOy 
Remark 6. If EX exists, so does ЕХ", since 
E|X'| < E|X| + E|X'| = 2E|X| <0, 
and we have EX = 0. 
Remark 7. Not every symmetric distribution is the result of a symmetriza- 


tion procedure. Thus P(X = +1} = 4 defines a symmetric гу that cannot 
be obtained by symmetrization. 


Theorem 10. 
(а) P(|X'|» ғ} < 2P(|X| > 2/2). 
(b Ifaz0 such that P{X > a} < I- p and Р{Х< —a)€ 1 — p, 
then P(|X| > е) > pP{|X| > a + e}. 


Proof. The reader is asked to construct a proof of Theorem 10. 


We conclude this section by noting that the mgf can frequently be used to 
find distributions of functions of random vectors. The following result is of 
great importance. 


Theorem 11. Let Xj, X» =, X, be independent rv's, and suppose that- 
the mgf of X; exists for each i = 1, 2, ---, п. Then the mgf of S, = Xy + X^ 
* + + X, exists and satisfies : 


(22) Ms, = TEM 
Proof. We have 
Ms (i) = Ег" = E | e: = [| Ee, 
7 і=1 D 


since е'^Х1, ..., e'X» are independent. 


In particular, if X, Xp, ---, X, are iid with common law (X), then 
My(t) 2My(t)— fori-1,2, ---, 
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and we have 
Муй) = My". 


Remark 8. The converse of Theorem 11 does not hold. We leave the reader 
to construct an example to illustrate this fact. 


Example 14. Let Xj, X», ---, X, be iid rv's with comiuon pmf 
P(X =k} = (pa — py, k20L2-,m0«pc«l. 


Then the mgf of X; is given by 
M(t) =(1 — p + pe)". 
It follows that the mgf of Sp = X, + Xz ++ + X, is 
мый) = Ñ (I = p + реу 
=(1—p + ре)”, 
and we see that Sm has the pmf 


(5, = з) = (п) -o s= 0, 1, 2s m. 


. Example 15. Let X have the pdf 


1 if 0<x<1, 
fe) " otherwise. 
The mgf of X is given by R 
ма) = -— 


Let Y = aX + b, where a and b are constants. Then 
i My(t) = Ee" 


= о 2. ме“ —1 
= e" M(at)=. a 


This is the mgf of the uniform distribution on (b, a + b) with pdf 


o- fe b<y<a+b, 
0, otherwise, 


is can easily be checked: 
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From these examples it is clear that to use this technique effectively one 
must be able to recognize the mgf of the function under consideration. In 
Chapter 5 we will study a number of commonly occurring probability dis- 
tributions and derive their mgf's (whenever they exist). We will have baca- 
sion to use Theorem 11 quite frequently. 

For integer-valued rv's опе can sometimes use pgf's to compute the distri- 
bution of certain functions of a random vector: See, for example, Problem 9. 


PROBLEMS 4.4 


1. Let F be a df and є be any positive real number. Show that 
y(x) = 1. £z F(x) dx 


and 


are also distribution functions. 
2. Let X, Y be iid rv's with common pdf 
ех _ ifx> 0, 
до 067 20 
(a) Find the pdf of rv's X + Y, X j- Y, XY, X/Y, min (X, Y), max (X, Y), 
min (X, Y)/ max (X, Y), and Х/(Х+ Y). 
(b) Let U = X + Y and V = X — /Y. Find the conditional pdf of V, given U = u, 
for some fixed u > 0. i 
(c) Show that U and Z = X/(X + Y) are independent. 
3. Let X and Y be independent rv's defined on the space (0, 9^, P). Let X be 
„uniformly distributed on (— a, a), a > 0, and Y be an rv of the continuous type 
with density f, where f is continuous and positive on 2. Let F be the df of Y. If 
uy € (— а, a) is a fixed number, show that 
280g ДУ). cr 
Рх) = 4 Fluo + а) — Fluo — а) 


0 otherwise, 


ifu—-a<y<m+a, 


where fy x+ (yuo) is the conditional density function of Y, given X+ Y= Up. 
4. Let X and Y be iid rv's with common pdf 
:e qt: $ © 
ОТ. ошмш | 
Find the pdf's of rv’s XY, X/Y, min (X, Y), max (X, Y]. 
min (X, Y) /max (X, Y). 
5. Let X,, X» X, be iid rv's with common density function 
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1 iüsxsl; 
fe) = f otherwise. 

Show that the pdf of U = X, + X, + X, is given by 
ln 0sucl, 
3ju—-w-i, 1su«2, 

(aa S 
үл 2<и<3, 
0, elsewhere. 


6. Let X and Y be independent rv's with common geometric pmf 
P(X-k)-zn(1-25 К=0,1,2,..;0<лт<1. 


Also, let М = max(X, Y]. Find the joint distribution of M and X, the marginal 
distribution of M, and the conditional distribution of X, given M. 


7. Let X be a nonnegative rv of the continuous type. The integral part, Y, of X is 
distributed with pmf P(Y = К] = А e/k!, k = 0, 1, 2, ---,A4 > 0; and the frac- 
tional part, Z, of X has pdf fz(z) = 1 if 0 € z < 1, and = 0 otherwise. Find the 
_ pdf of X, and the mean and variance of X assuming that Y and Z are independent. 


8. Let Xand Y be independent rv's. If at least one of X and Y is of the continuous 
type, show that X + Y is also continuous. What if X and Y are not independent? 


9. Let X and Y be independent integral rv's. Show that 
Ра) = Py()Py(), 
where P, Px, and Py, respectively, are the pgf's of X + Y, X, and Y. 


10. Let X and Y be independent nonnegative rv's of the continuous type with 
pdf's f and g, respectively. Let f(x) = e-* if x > 0, and = 0 if x < 0, and let g be 
arbitrary. Show that the mgf M(r) of Y, which is assumed to exist, has the property 
that the df of X/Y is 1 — M(— 1). 


M. Let X, Y, Z have the joint pdf x 
_f[l+x+y+z* if O<x,0<y,0<z, 


Find the pdf of U = X + Y +Z. 


12. Prove Theorem 9. In view of Example 10 can you suggest an improvement 
in Theorem 9? 


13. Prove Theorem 10. 
M. Let X and Y be iid rv's with common pdf 
a [EVI 1 е0 х 0), 
i Fx) 8 Т х<0. 
Find the pdf of Z = XY. 
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15. Let X and Y be iid rv's with common pdf f defined in Example 8. Find the 
joint pdf of U and V in the folloving cases: 

(a U= XY V = tan (X/Y), —(z/2)« V < (x/2). 

(b) U-(X4YJ2, V = (X — Yyj2. 


16. Prove Theorem 7. 


17. Construct an example to show that even when the mgf of X + Y can be written 
as a product of the mgf of X and the mgf of Y, X and Y need not be independent. 


45 ORDER STATISTICS AND THEIR DISTRIBUTIONS 


Let (Xi, X», ---, Х,) be an n-dimensional random vector, and (x;, x», +++, Xn) 
be an n-tuple assumed by (X, Xz ---, X,). Then let us arrange xy, xo, +++, X, 
in increasing order of magnitude so that 


Хау) хо) 5 SX, 


where xq, = min(x,, хә, +, X,), X is the second smallest value in x, 
Xo 77, Xp and so on, xo = max (xy, x», +++, x4). If any two xj, ху, are equal, 

Н 1 
their order does not matter. 


Definition. The function X, of (Xj, X», ---, X,) that takes on the value 
X in each possible sequence (ху, Xo --, x,) of values assumed by 
(Xy, Xo, +++, Х„) is known as the kth-order statistic or the statistic of order k. 
Xa» Xo, ++) Xm} is called the set of order statistics for (Xj, X2, +++, Xp). 


Example 1. Let Xj, X», X; be three rv's of the discrete type. Also, let X, 
Хз take on values 0, 1, and X, take on values 1, 2, 3. Then the random 
vector (Xj, X», Хз) assumes these triplets of values: (0, 1, 0), (0, 2, 0), 
(0, 3, 0), (0,1, 1), (0,2, 1), (0, 3, 1), (1, 1, 0), (1, 2,0), (1, 3,0), (1, 1, 1), 
(1, 2, 1), (1, 3, 1); Xa, takes on values 0, 1; X; takes on values 0, 1; and 
Хз, takes on values 1, 2, 3. 


In Section 4.4 we showed that Ху, and X,» are themselves rv's. 


Theorem 1. Let (Xj, Хә, +++, X,) be an n-dimensional rv. Let Xw, 
l < К < п, be the order statistic of order К. Then X» is also an rv. 


Proof. The proof of Theorem 1 is left as an exercise. - 


In the following we assume that Ху, X;,---, X, are iid rv'sof the continuous 
type with-pdf f. Let (Xi, Xi, -s Xim} be the set of order statistics for 


\ 


\ 
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Xy, Xp, +++, X,. Since the X; are all continuous type rv's, it follows with prob- 
ability 1 that 
Ха, € Xo € < Xo 


wins compute the joint pdf of (Xan Xi, +, Xw). 
Theorem 2. The joint pdf of (Xa; Xes +, Xim) is given by 


nt ffo Xm € XQ < ++ < Xi» 
i= 


0, otherwise. 


(1) 8(xa» Xe» Xo) = 


Proof. The transformation from (X, X», =, X,) to (Ха, Xo, S Xu) 
is not one-to-one. In fact, there is a total of n! possible arrangements of 
Xp Xz, 7, X, in increasing order of magnitude. Thus there are n! inverses 
to the transformation. For example, one of the n! permutations might be 
< хр < Xs Xo X X, X X 
- Then the corresponding inverse is 
NETS 

34 — Xa» х= Xon Xni = Xo» Хз = Ху, ts Xy = Xin- 

зү we 
The Jacobian of this transformation is the determinant of an n x n identity 


matrix with rows rearranged, since each х; equals one and only one of 
Xp Xo, +, X4. Therefore J = 1; and 


" 
20х02» Xo» Xa» Xa» 75 Ха» Ха-р) = JI IA)» 
iz 


Ха) € XQ € «Xo. 


The same expression holds for each of the n! arrangements. 
It follows (see Remark 4. 4. 2) that 


я. 
Ba» Xo» Xm) = D Aw) 
‚ alln! jl 
inverses 


qe A if хау € Ху < xo, 
0 otherwise. 


Example 2. Let Xj, Xo, ---, X, be iid rv's with common pdf 


Hohe ite О Ет. eb 
f9- l otherwise. 


Then the joint pdf of Xo» Ху, y Xo, is 


nh QO0«y«y-c-«y«l 


Ov Ya S Y) = e otherwise. 


ORDER STATISTICS 151 


Example 3. Let X;, X», Xs, X, be iid rv’s with pdf f. The joint pdf of Xo» 
Xo» Xan Ха) is 


DO. л <y <y <y 
0, 


Ei Y» Ys Ya) = otherwise! 
Let us compute the marginal pdf of Xo. We have 
20» = 4! ff SODOV) d dys у, 
= 4402 f^. 7 Lf r0 4] fo2 00 4» n 
= 409 f^ (t - Foot Лон) dn) fon) n 
= алоо (^. IL Вор pay, 
= лор UFO py, yer, 


The procedure for computing the marginal pdf of X;,,, the rth-order 
statistic of Xj, Xo, ---, X,, is similar. The following theorem summarizes 
the result. 


Theorem 3. The marginal pdf of X, is given by 


Q в) = туу FON” 1 – FOI Ду), 


where F is the common df of Xj, Xo, +++; X,. 


Proof. 
so) = of PEE C, fn dy, буа 
* dy, 
= miso) EEE P^ P^ yd Фа 
1 — FI Jr 3: 
= тло) P LEGE (Poor n. 
as asserted. 


We now compute the joint pdf of X; and Хь, 1 <] < k & n. 


с ЖАЛ 
* 


Theorem 4. The joint pdf of Х, and X, is given by 


152 RANDOM VECTORS 


Fy) FOW= 
1)! (k - T = D! - k)! 
O so» [т п Foal SOSO) у «ә 


otherwise. 


Proof. 


10,» = J - EE b. Felt УЧ 
«dyp: дууа дур" буун di ЧУу-\ 
=н PE ро ОРЫГ ур fo) ЈО 
dy dra dn Oo ПЫНА 
-m РОН ayy — f^ Шор. г) ont 
AOD A) SO) dri = dyja 


= poge g D l Four Fon РОЙ" 


-1 
TOOD EO уусу», 


as asserted. 


In a similar manner we can show that the joint pdf of Xj, +++) Хи» 
1<ji <j: < <ips,1 < Ко< п, is given by 


12 n! 
Fh jn p (Vis Yn "У Ую) = UTD Arq jo 
FAM) fOr) EO) — FO? 1 ЛО) = [1 — FOE ^ fo 
for y, < y; < = < Yp and = 0 otherwise. 


Example 4. Let X; X, ---, X, be iid rv's with common pdf 


oeri if0<x<l, 
fo B otherwise. 


Then 


! - ncr 
; son- [nier t а 
oo lO, 


. (srs), 
otherwise, 


The joint distribution of X, and X is given by 
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пі ji k-j-1 k 

j-DWE-j- gy toy sa Mle os 

Si Vi УЮ = СИА ИТА! ‘ Jj € Je 
0, otherwise, 


where 1 <j < k <n. 


Example 5. Let Xj, Хо X5 be the order statistics of iid rv’s Xy, X» 
X; with common pdf 
B е. x>0d 


уо) = { 


0, otherwise, 


(B > 0). 


Let Y; = Xa; — Xo апа Y; = Xi). We show that Y, and Y, are inde- 
pendent. The joint pdf of Xo and Xa is given by 


! C. Mo cy 
x5 у) = [rite U ep e Be ^ x % 


otherwise. 
The pdf of (Y;, Y?) is 
fO y) = 31 PU = e fm ener mme 
" {8 18е 041 — е ^23 (ge ^), 0< yy « 0,0 < уг < оо, 
, otherwise. 


It follows that Y; and Y; are independent. 
PROBLEMS 4.5 n 


1. Let Xa» Ха» 55 Xm be the set of order statistics of independent rv's 
Х, X, +++, X, with common pdf 


_ {be* if x 20, 

au) f otherwise. 

(a) Show that Xe; and Хау — Xe are independent for any s? г. 

(b) Find the pdf of Xes — Хо. 

(с) LetZ, = пХа, Z: = (п = D) (Хо — Ха), Zs = (л — 2) (Хо — Хо) +, 
Z, = (Xw — Хол). Show that (Zi, Zp -, Z,) and (Xy, Xo s Xn) are 
identically distributed. ? 

2. Let X, X, +++, X, be iid with pdf 


yj fe ee - 0/1 if x6, 
0 if x < 6. 


Show that Xa» Хао — Xa» Xa — Xa»: Xo) — Xin- are independent. 
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3. Let X, Xn» +, X, be iid with a df 
AN Cal #0 <у<1, 
қ) = 


otherwise, 
Show that ХХ, і = 1, 2, =, n — 1, and X; are independent. 
4. Let X, and X, be independent rv's with common pmf 
Р(Х =x} -p1—p)y-4x-21.12,;0«p«l. 
Show that X4; and Хо — Xw are independent. 


а> 0. 


5. Let X,, X, ---, X, beiid nonnegative гу ѕ of the continuous type. If E|X| < co, 
show that E|X(,)| < оо. Write M, = Xem = max (X, X», «+, X,). Show that 


EM, = EM, + $2 Fr()-Rldx п=2,3,—.. 
Find EM, in each of the following cases: 
(a) X, have the common df 
j Fx)y21—e-, x20, 
(b) X; have the соттоп. 
F(x) = х,0<х <1. 


6. Let Xa» Xo, +, Хоу be the order statistics of n independent rv's 
АХ, Xo =, X, with common pdf f(x) = 1 if 0 < x < I, and = 0 otherwise. Show 
that Y, = ХХ» Ye = Ха Хау s Ya = Xo-p/Xo» and Y, = Xo» are 
independent, Find the pdf's of Y, Y» =, d. 


4.6 MOMENTS AND MOMENT GENERATING FUNCTIONS 


Let (X, Y) be a two-dimensional rv, and g be a Borel-measurable function 
on ^». 


Definition 1. Let (X, Y) be a two-dimensional rv of the discrete type with 
pmf p;; = Р(Х = x, Y = yj), and let g: 5; — Ф be a Borel-measurable 
function. If 37; ; piy |e(x;, y| « co, the series 


Eg(X, Y) = E paso »j) 
is called the expected value of g(X, Y). 


Definition 2. Let (X, Y) be an rv of the continuous type with joint density 
f(x, y), and let g: 2; > Ф be a Borel-measurable function. If 


f fle nEn as dy о, 
the integral 
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кх, р [0 e fos 0 de dy 
is called the expected value of g (X, Y). 


We shall be mainly interested in functions of the type g(x, у) = x! y* where 
jand k are nonnegative integers. 


Definition 3. If E(X’Y*) exists for nonnegative integers j and k, we call it 
a moment of order (j + К) of (X, Y) and write 
(1) my = E(X! Y). 
If moments of order 1 of (X, Y) exist, then clearly 
то = EX, mg, = EY. 


Either or both of these moments may not exist. Similarly, there are three 
possible moments of order 2: 


my = ЕХ, . тц = Е(ХҮ),  me- EY’. 
Definition 4. If E{(X — EX) (Y — EY)'} exists for nonnegative integers j 
and k, we call it a central moment of order (j + К) and write 


(2) шк= E{(X — EXY (Ү = EY)'}. 


Clearly 


шю = Hor = 0, 
иҗ = var (X), шз = var (Y), 


and 
piu = E(X — mo (У — ту)}. 


Definition 5. If E((X — EX) (Y — EY)} exists, we call it the covariance 
between X and Y and write 


(3) cov (X=Y) = E((X — EX) (Y – EY)}. 
We have 
cov(X, Y) = E{(X — mo) (Y — my) 
= E(XY) — EX EY 
(4) = My — Mo Mor: 


Example 1. Let (X, Y) be jointly distributed with density function 
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_[х+ у, 0<х<1,0<у<1, 
JG, »-0. otherwise. 


Then 
EX'y” = [Уе + y) dx dy 


= fife seas fr еә 
1 


1 
=) Г) (и) 
where / and m are positive integers. Thus 
ЕХ = EY =}, 
EX’ = EY’ = 4, 
var (X) = var(Y)=4-—%= We 
cov (X, Y)= 4-8 =- ш. 


It is clear that Definitions 1 to 4 can be generalized to the n-dimensional 
case. 


Theorem 1. Let Xy, Xz, =+, X, be rv’s such that E[X,| < co, = 1, 2, ++, n. 
Let а, ag, ·--, a, be real numbers, and write 

S = aX + aX; + + а„Х,. 
Then ES exists, and we have 


(5) ES = Xa; EX;. 
P 


Proof. If (Xy, X» «+, X,) is of the discrete type, then 
ES ре E. (ах, + ару, + + ax) Р(Х = xs Xo = x Х„ mx, 
= а x xy m {А = хуз Xa = Xin) : 
+e +a, p х, 5, Р(Х = Xip s Xn = х} 
= а L XaP{X m xj) ta, pa x, P(X, = Xin} 
= a, EX, + +. + a, EX,. 


The existence of ES follows easily by replacing each a; by |a,| and each 
хуу by |х; and remembering that Е|Х < со, j = 1, 2, ---n. The case of 
continuous type (X;, X, ---, Х,) is similarly treated. 


Corollary. Take a, = а; = > = a, = l/n. Then 
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(Ж + ® + =t Ka). Еу, 


and if EX, = EX, = -- = EX, = ц, then 
Xp +X tX 
Ae Up 
&( = ) H- 


Theorem 2. Let X,, Xz, «++, X, be independent rv's such that E|X,| «'co, 
i 2.1, 2, ^, n. Then E(II7., X;) exists and 


(9 Et х) = Й x. 
yz i=l 
Proof. The proof is left as an exercise. 
Corollary. . If X and Y are independent, cov (X, Y ) = 0. 


We recall that 
cov (X, Y) = E(XY) — EX EY, 
so that, if X and Y are independent, EXY = EX EY and the result follows. 


We emphasize the fact that, if X and Y have zero covariance, it does not 
follow that they are independent. 


Example2. Let X be a symmetric rv with finite third moment, and let 
Y = X°. Then E(XY) = EX? = 0, and it follows that X and Y have zero 
covariance. But X and Y are not independent. In fact, they are strongly 
dependent. 


Let X and Y be independent, and g,(-) and g;(-) be Borel-measurable 
functions. Then we know (Theorem 4.3.3) that g,(X) and gj(Y) are inde- 
pendent. If E(g(X)), E{go(¥)}, and E(g (X) go(¥)} exist, it follows from 
Theorem 2 that — 

0) E(gi(X) g(Y)) =E{ei(X)} E(g(Y))- 

Conversely, if for any Borel sets. Ay and. A, we take g(X) — 1 if Xe A, 

and = 0 otherwise, and 24У) = 1 if Ye 4», and = 0 otherwise, then 
E(g(X) g(Y)) = P(Xe A, Ye Aj 


and E(g(X) = P{XeA,}, E(g(Y)) = P(YeA;). Relation (7) implies 
that for any Borel sets A, and A; of real numbers 
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P{X € А, Ye 4j) = P(Xe 4j) P(Ye Aj). 
It follows that X and Y are independent if (7) holds. We have thus proved 
the following theorem. 


Theorem 3. Two rv's X and Y are independent if and only if for every 
pair of Borel-measurable functions g, and gz the relation 


(8) E(g(X) gXY)) = E{e(X)} E(gX Y) 

holds, provided that the expectations on both sides of (8) exist. 

Theorem 4. Let X and Y be two rv's with finite variances. Then cov (X, Y) 
exists. Moreover, 


(9) ЕҶХҮ) < EX’ EY* 
with equality if and only if there exists real numbers @ and B not both zero such that 
P{aX + BY = 0} = 1. [Relation (9) is known as the Cauchy-Schwarz inequality] 


Proof. Since 


for every pair of real numbers aand b, it follows that E| X Y | exists if EX? < оо 
and EY? < ©. To prove (9) we have, for aay real numbers o and В. 


Е(оХ + PYY = о? EX + 2aBE(XY) + В EY* > 0. 


If ЕХ? = 0, (9) is trivially true. Assume that ЕХ? > 0, and choose 
a = — [E(XY)/EX*). We have 


2, 2, 
ID x. 180m + EY! > 0, 


which yields (9). Strict equality holds if and only if there exist real numbers a and 
el inch ota + Y8y = 0, which happens if and only if P{aX + 
= 0} = 1. 


Theorem 5. Let Xi, Xz, ---, X, be rvs with E|X;|? < оо fori = 1,2, n. 
Let aj, à; ---; a, be real numbers and write 


S= bak. 


Then the variance of 5 exists and is given by с $ 
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(10) var (S) = Z d var (X;) + $ й ааа cov (X;, X). 


If, in particular, Xj, X “+, X, are such that cov (X; X)-20 for i, 
ј = 1, 2, =n, i +j, then 


(11) var (S) = È а var (X). 


Proof. We have 
var (S)= Е{ È аге A a; ЕХ}? by Theorem 1 
= E(É d (X, - ЕХ) + EY aa(X, — EX) (X, — ЕХ) 
- p d E(X; - EX + BE aa; B(X EX) Qf, — EX))- 


If the X;'s satisfy 
cov (Xp Xj) =0 forij2l12,-,mizj 
the second term on the right side of (10) vanishes, and we have (11). 


Corollary. Let X;, X», --:, X, be iid rv's with var (X;) = 0°, і= 1, 2, n 
Then 
var (È ах) = 0° 
г=1 


4 


і=1 


In particular, 


var ( ae yee 


Note that we only need cov (X;, Xj) = 0, j,k 21,2, mj * k, rather 
than independence of the X;'s. 


Theorem 6. Let Xj, Xz ---, X, be iid rv's with common variance a’. Also, let 
а, ау, =“, а, be real numbers such that 575 а; = 1, and let $ = j- a;X;. 
Then the variance of S is least if we choose a; = 1/n, i = 1, 2, ++, n. 


Proof. We have 
vat (S) = 0° ў d, 


which is least if and only if we choose the а; so that 27; a; is smallest, 
subject to the condition }]7_, a; = 1. We have 


D 
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which is minimized for the choice a; = 1/n, i = 1, 2, +-+, n. 


Note that the result holds if we replace independence by the condition 
that cov (X, Х) = 0, 0] = 1, 2, n (i 4 j). 


Example 3. Suppose that г balls are drawn one at a time without replace- 
ment from a bag containing и white and m black balls. Let S, be the number 
of black balls drawn. 

Let us define rv's X, as follows: 


X,-21 if the kth ball drawn is black 


түй) uif the Ath ball drawn is white, ^. ^ 1» 2» 7 7 
Then » 
Soa И Ee 
Also 
m n 
(12) Р{Х = 1} = aia P{X, = 0} XII x 
Thus EX, = m/(m + n), and 
var (X) = тт? 


mn 
m+n (menm (m+n. 


To compute cov (X;, Ху), j # k, note that the rv X,X, = 1 if the jth and 
the kth balls drawn are black, and = 0 otherwise. Thus 


= ‚= = = —— m 
(13) E(X,X)) = P(X, 2 1, X, = 1) HF тзт 
апі 
Xp i= 
cov (An Xa (m + nf (m + n — 1) 
Thus 


ES, = È EX, = 


mr 
m+n 
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апа 
E S £ mn пв 
аго Gn + ny NU (m + ny (m+n = 1) 
[er (т+ л – г). 


тан тън 1) 
The reader is asked to satisfy himself that (12) and (13) hold. 


Example 4. Let X;, X» ---, X, be independent, and aj, ap, ---, a, be real 
numbers such that J} а; = 1. Assume that Е|Х < oo, i=1, 2, =m 
and let var (X) = op i= 1, 2, =, n, Write $ = У" а;Х,. Then var 
(S) = X, ajo? = о, say. To find weights a; such that с is minimum, we 

write 1 


в = аа + @% + + (1— ay а — Oy Oy 


and differentiate partially with respect to aj, a», ---, 2, .,, respectively. We 
get 


дт 
3 = 20102 — Al — а — a — +++ —4,4)0, =0, 

до Á "o5 
c - 22,1024 ( 1 — a, -a + – a,-1)02 = 0). 


It follows that 
ag = a0, j-l2-n-l 


that is, the weights а, j = 1, 2, ---, п, should be chosen proportional to. 
105. The minimum value of е is then 


o: 


a Kè » 
eum E n eub, 
1 0; A о; 


where k is given by D- (kia) = 1. Thus 


where Н is the harmonic mean of the variances, a. 3 (a 
We now define the moment generating function of a random vector. For 
notational convenience we restrict ourselves to the bivariate case. 


Definition 6. Let (X, Y) be a two-dimensional rv. If E(e'1* 2" ) exists for 
|n| < Ai, |] € hz, where h, and hz are positive real numbers, we write 


4 


` Proof: Let X and Y be independent. Then 


Conversely, if ` “© 


"Proof. For the proofs we refer the reader to Widder [137], page 460 (see 
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(14) | May ы) = Be") 
and call it the moment generating function of the joint distribution of (X, Ү) 
or, simply, the mgf of (X, Y) 


Theorem 7. The mgf M(t; t2) uniquely determines the joint distribution of 
(X, Y), and conversely, if the mgf exists it is unique. 


Corollary. The mgf M(t, t2) completely determines the marginal distribu- 
tions of X and Y. Indeed, 


(15) det Mh, 0) = Ее) = Мх), 
ROO) RU ` MO, 0) = Е(е) = Мү(). 


also P. 2.14). 


"Theorem 8: If M(t}, ty) exists; the moments of all orders of (X, Y) exist 
and may be obtained from 


" +n 
(17) e йы NS Zoe. 
è д1 0t; \ty=tg=0 


Thus ! 
3M0, 0) _ 3M0, 0) _ 

Oty ER ТЯ үл: 
#м(о, 0) 2 MO, 0) 2 
EMO, Bx, T EY 

9t А у 
9 M(0, 0) _ 
209. 


ty 0t; (XY), 


and so on. 


Proof. See Widder [137], pages 446-447, for the proof (see also P. 2.14). 
Theorem 9. ` X and Y are independent rv's if and only if 
(18) Mty, t2)  M(ts 0) MQ, t2) for all tj, t; € 2. 


Mt, ы) = Бех) = Ee) Ee?) = M(t, 0) MO, 0). 


> 
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M(t, t2) = M(t, 0) M(O, 1), 
then, in the continuous case, 
[fee n ae an fe ло) as] fn n o] 
that is, 
ferm ro o dx - [fet no) до) dx d. 
By the uniqueness of the mgf (Theorem 7) we must have 


Дх, у) = fio) f) for all (x, y) e 2». 
It follows that X and Y are independent. A similar proof is given in 2^ 
case where (X, Y) is of the discrete type. ни? 
Example 5. Let (X, Y) be pig distributed with density function 


0«x«o,0«y«o 
у= , J : 
fes {5 otherwise. 


Then 
ма, в) =f J, eh ttr 67777 dy dy 
EXT ETT t «10,1«1, 
x = 9М@ 0) ., gy. MOO) _ i, 
дц WE NS 
and 
2 
2 MG, 19... 2 so that EX? = 2. 


ar (у 
Similarly EY? = 2, so that var (X) = var(Y) = 1, and 


EXY = М, ty) | 


0t; 0t; ео, kak cee ha 
so that cov (X, Y) = 1—1 =:0: Indeed, Xand Y are independent since 
ЈО, y)-e*e? = f(x) fy) ^ forall x, ye, 
where f1; f; аге the two marginal densities: ; 


‘Example 6. For the bivariate density of Example 1 we have 
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Ма) f f (x + y) е e? dx dy 


1 


— 1) ё1+ 1] (e2 — 1) 


1 
t 
E K 1 
+ 106-0 e + NE - 0). 
fit; 
Using (17), we can again check that 
ЕХ = ЕҮ = i ечим 
and cov (X,Y) = — 
We conclude this section with some important moment inequalities. We 
x меп with the simple inequality 
T Q9) |a + b| < eal + 16, 


where с, = 1 for 0 < r < 1, and = 2^ 1 for r> 1. For г= 0 and г = 1, 
£19) is trivially true. 

First note that it is sufficient to prove (19) when 0 < a < b.Let 0 <a < b, 
aad write x = a/b. Then 


(a+ by _ @+ ху 
И ТЕХ е 


Writing f(x) = (1 + х) + х7), we see that 


r-i 
по Ia 


T where 0 < x < 1. It follows that f'(x) > 0 if r > 1, =0ifr = 1, and <0 
if r < 1. Thus 


max f(x) = f(0) = 1 ifr< 1, 
[ET 
: while 
= 26 27 i 
max f) f0)22 ifrzl. 
Мае that. |а +.5|" < 2(а|' + |b|) is trivially true since 
a+b} < max (2|а|, 2|Ь|). 
Ah immediate application of (19) is the following result. 


| Theorem 10. Let X and Y be rv's and г >\0,be a fixed number. If Е|Х|', 
BtY|" are both finite, so also is E|X + Y. 


Proof. Leta = Xand b = Y in (19). Taking the expectation on both sides, 
we see that 
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Е|Х + Ү| < с(Е|Х| + Е|Ү|), 
where c, = 1 if0 « r < 1, and 2 2^! ifr> 1. 


Next we establish Hólder's inequality, 
р а { 
Q0) bol s BÈ + bt 


where р and 4 are positive real numbers such that p > 1 and 1/р + 1/4 — 1. 
Note that for x > 0 the function w = log x is concave. It follows that for 
Xp X20 


log [tx + (1 — t)x;] > t log x, + (1 — 1) log xp. " 

Taking antilogarithms, we get 

xi ху Sty + (1 = х 
Now we choose x, = |x|, x; = [v t = 1р, 1—1 = 1/g, where р> 1 
and 1/р + 1/9 = 1, to get (20). 
Theorem 11. Let p > 1, g> 1 so that 1/р + 1/g = 1. Then 
(21) E|XY| < «E|x|^"^ (|у). 
Proof. By Hólder's inequality, letting x = X (E|X |^] ^^, y = Y(E|Y |) "^ 
we get 

[XYI S p pep cippo qe] pn 

* 4" Yea ppm сере). 

Taking the expectation on both sides leads to (21). 


Corollary. Taking p — q — 2, we obtain the Cauchy-Schwarz inquality, 
E|XY| < Е!|х|} £'? |у, 
The final result of this section is an inequality due to Minkowski. 


Theorem 12. For p > 1 
(22) {E|X « v)" < £E|x py" + (e| v po)! 
Proof. We have, for p > 1, б 

ре v s |x| [x реа [v] [x 2 үү. 


Taking expectations and using Hülder's inequality with: Y replaced by 
|X + Y|^! (p > 1), we have 
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E|X + Y|? < (ЕХ)? {Е|Х + Theo 
+ СЕУ? {Е|Х + Y |o oy 
= (Ех) + ce|v |] - {Е|Х + |n. 

Excluding the trivial case in which E|X + Y|’ = 0, and noting that 
(p — 1)g = р, we have, after dividing both sides of the last inequality by 
{E|X +, Е] 

{Е|Х + py? < СЕ|) Еур), p» 
The case p — 1 being trivial, this establishes (22). 


| PROBLEMS 4.6 


1. Suppose that the rv (X, Y)is uniformly distributed over the region R — ((x, у): 
0 <x < y « I). Find the mgf of (X, Y). Hence or otherwise find the covariance 
between X and Y. 


2. Let(X, Y) have the joint pdf given by 


ху А 
fene 18. if0<x <1, 0<y<2, 


otherwise. 
Find the mgf of (X, Y), and moments of order 2. 
3. Let (X, Y) be distributed with joint density 


Lill + хубе — у] if] s 1, lyks 1, 
4 fes у) -(5 _ otherwise. 


Find the mgf of (X, Y). Are X, Y independent? If not, find the covariance 
between X and Y. 


| 4 Fora positive rv X with finite first moment show that (D E X € y EX, and 
(2) E(1/X) > 1/EX. 


5. If X isa nondegenerate rv with finite expectation and such that X > a > 0, then 
E(4/X* — a) < (EX – а?. 3 
(Kruskal [63]) 
6. Show that for x > 0 
ý ( LÁ te" /? dre < fen *di fretar 
and hence that 
feto dt > i[(A + x2) /?—x] e-/*; 


T m a pdf f that is nondecreasing’ in the interval.a « x < Б, show that for 
anys 
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b bu - qi b 
25 cl Jie. na 
fa f(x)dx = (0з 1Xb =a) free, 
with the inequality reversed if f is nonincreasing. 
8. Derive the Lyapunov inequality (Theorem 3.4.3) 


(EAI « (EX) зоо; 


from Hólder's inequality (21). 
9. Let X be an rv with E| Х|" < co for r > 0. Show that the function log E|X|" 
is a convex function of r. 

10. Show with the help of an example that Theorem 12 is not true for p < 1. 
1L Show that the converse of Theorem 10 also holds for independent rv's that 
is, if E|X + Y| < co for some r > 0 and X and Y are independent, then £| Х|” 
< oo, E| Y| < oc. l 

(Hint: Without loss of generality assume that the median of both X and Y is 0. Show that, for 
any t2 0, P(IX + YI >} 22 P(XI >. Now use the remarks preceding Lemma 3,2.2 to 
conclude that ЕЇХҮ < ©.) | К 
12. Let (0, 4, Р) be a probability space, and Aj, Aj, ++, Ay be events in 9^ such 
that P(U7., Ay) > 0. Show that í 


(X PAY - È PA, 
2X ECORMAjAQm—tL— om. 
1S jn Ру Ay) 
(Chung and Erdós [14]) 


(Hint: Let X, be the indicator function of Ay, k = 1, 2, «n. Use the Cauchy- 
Schwarz inequality.) 


13. Prove Theorem 2. 


4.7 CONDITIONAL EXPECTATION 


In Section 2 we defined the conditional distribution of an rv .X, given. Y. We 
showed that, if (X, Y) is of the discrete type, the conditional pmf of X, given 
Y = у, where P(Y = y,} > 0, is a pmf when considered as a function of the 
x;'s (for fixed у). Similarly, if (X, Y) is an rv of the Continuous type with 
pdf f(x,y) and marginal densities f; and fz, respectively, then, at every point 
(x, y) at which f is continuous aid at which fiy) > 0 and is continuous, а 
conditional density function of X, given Y, exists and may be defined by 


frah = дез. 
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We also showed that fy,y(x|y), for fixed y, when considered as a function of 
x is a pdf in its own right. Therefore we can (and do) consider the moments 
of this conditional distribution. 


Definition 1. Let X and Y be rv's defined on a probability space (2, $, P), 
and let л be a Borel-measurable function. Assume that Eh(X) exists. Then the 
conditional expectation of A(X), given Y, written as E(A(X)| Y), is an rv 
that takes the value E(A(X) |y), defined by 


LhxP(X-x|Y-y) if(X,Y)isof the discrete 
O E(O|») =" ` typeand P(Y = y) > 0, 
ў A(x х|у) dx if (X, Y) is of the 
і b. Mars continuous type and fX у) > 0, 


when the rv Y assumes the value y. 


"Needless to say, a similar definition may be given for the conditional 
expectation E(A(Y)| X), provided that ЕАУ) exists. 

It is immediate that the гу E{h(X)| Y ) satisfies the usual properties. of an 
expectation. For example, one can show easily that 


(2) E(c| Y) = c, where c is a constant; 
(3) E(aX + b|Y) =a E(X|Y) +b, а, b constants; 
(4) if gı, gz are Borel functions and Eg,X) exists for i = 1, 2, then 


Еа аво) = aE( (XO | Y) a E(g GO] Y) : 
(5) if X > 0, then E(X|Y) > 0; 
(6) if X, = Xz then E(Xi| Y) > E(X;| Y). 


` The moments of a conditional distribution are defined in the usual manner. 
Thus, if ЕХ” exists for some integer г > 0, then E(X "| Y) defines the rth 
moment of the conditional distribution. We can define the central moments 
of the conditional distribution and, in particular, the variance. There is no 
difficulty in generalizing these concepts for n-dimensional distributions when 
3t 7 2. We leave the reader to furnish the details. 


Y Example 1, An urn contains three red and two green balls. A random 
sample of two balls is drawn (a) with replacement, and (b) without 
replacement. . Let. X = 0 if the first ball drawn is green, = I if the first ball 
drawn is red, and let Y = 0 if the second ball drawn is green, = 1 if the 
second ball drawn is red. д 

The joint pmf of (X, Y) is given in the following tables: 
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(a) With Replacement (b) Without Replacement + 


A X| 
a CP SOMA ЯА ЖО ПЕ 
0% В [d зоа A |+ 
t$ 2 |3 1] $ $ |} 
20 - : 
The conditional pmf’s and the conditional expectations аге as follows: 
(a) Р{Х = x|0} = E D P(Y- 0) = B "5r 
р(х = х= ($ m Piy 2n - (5 йү 
вор ЖГ ая мн тү 
tata ТОМЛЫ (i pot 
lad Boy Жайны ру E 
ва) = ушул ИЛИ {р х-ы 


Example 2. For the rv (X, Y) considered in Examples 4.2.5 and 4.2.7 


Бур) = Долоо = ARE AIS <<, 


and 
E(X|y) = foster dx = a 0«y«l. 
Also, 
Etpe abes о<у<, 
and 


var (X|y) = E(X^|y] EX |y}? 
ago mets oie окчу; 
Theorem 1. Let Eh(X) exist. Then 
(7) . EWX) = E{E{KX)|Y}}- 
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Proof. Let (X, Y) be of the discrete type. Then ' 
E{E{W(X)| Y}} = x iz h(x) P(X = x| Y = y}} P(Y = у) 
Ji X (x h(x) P{X = x, Y = y}} 
= X h(x) Z P{X = x, Y = y} 
= EX)... 
If (X, У) is of the continuous type, then 


BLENDS = ^. дл) epo a 


= ff fi кәл, nad 


=f im {f sen a} a 
= EX). 


In Theorem 1 it is important that E|&(X)| < со for (7) to hold. 


Example 3 (Enis [26]). Let Y be a positive rv with pdf 

(uA y M era ч 

ют р Peet 

0, ys, 

und let the conditional pdf of X, given y( 7.0), be expressed Ьу, ' 
hex |y) = (2л) 1 y^ ел, — oo «x < co. 

Then E(X|Y] exists, and since A(x|y) is symmetric it follows that 
E{X|y} = 0. The marginal pdf of X is given by 


fle) = 20) Mex») ay 


= eee ҸЕ — d 
W a O) f D 
fafa ea 
zb x) 
- This is the Cauchy density, for which we have seen that E|X| = oo. 
. Thus E(E(X|Y)) exists, but not EX, and (7) does not hold. 
ah e 


е-уа+® dy 


—00«x«oo. 


| Theorem 2. If EX? < co, then 
(8) var (X) = var (E(X| Y}) + E(var {X| Y}. 
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Proof. The right-hand side of (8) equals, by definition, 
{EE{X|¥}) - [E(E{X| YDE) + Е(Е{Х?| yy E*(X| Y) 
= {E(E* {Х| Y})- (ЕХ)?} 4 EX? — E(E {Х| Y) 
= var (X). 
Corollary. If ЕХ? < co, then 
(9) var (X) > var (E(X| Y}) 
with equality if and only if X is a function of Y. 
Equation (9) follows immediately from (8). The equality in us holds if 
and only if 
E(var {Х|Ү}) = EX - Е{Х|Ү} = 0; ' 
which holds if and only if 
(10) X = E{x|Y}. 


We recall that, if X and Y are independent, 
Fyy(x|y) = Ех) for all x, 
and 
Friz(y|x) = Fr). for all y. 


It follows, therefore, that if E{h(X)} exists and Y and Y are independent, 
then 


(11) E{h(X)| Y) = E(RX)). 
Similarly 
(12) E{WY)|X} EG), 


provided that E(A(Y)) exists, and X and Y are independent. Note that (8) 
becomes trivial if X and Y are independent. 


PROBLEMS 4.7 


1. Let Х be ап rv with pdf given by 
epf -4 95 ШЫ —оо<х<оо, —соо<и<оо,о>0, 


1 
IO = 7% 
Find E(X|a < X < b}, where a and b аге constants. 
2. (a) Let (X, Y) be jointly distributed with density 
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ЖІ + x)-*ect*2", х,у > 0, 
x fe »- G otherwise. 
Find E {Y| X}. 
(b) Do the same for the joint density 


/(% у) = +(Х + 3y)e= xyz0 
= 0, otherwise. 


3. Let (X, Y) be jointly distributed with bivariate normal density 


25 1 
Ух, у) Tm d — p 


(же 2 5 d 4 2 
- exp{— XI LC af) ~ 29 * zy e )+(2 24). 
Find E(X|y) and E(Y|x). (Here и, р є 2, т, о; > 0, and |p| < 1.) 
4. Find E(Y — E(Y|X])*. 
5. Let X and Y be rv's, and ọ(X) be an rv. Assume that EY and Еф(Х) exist. 
Show that (а) E(g(X)IX) = p(X), and (b) E{y(X)¥|X) = e(X)E(Y| X). 


[Results (a) and (b) hold in general. We ask the reader to supply the proof in the 
discrete case enly.] 


48 THE PRINCIPLE OF LEAST SQUARES 


Let X, Y be dependent rv's, and suppose that we wish to find the functional 
relationship between X and Y. Suppose also that this relationship is y = A(x). 
We assume that both EY? and E[R(X)f. exist. Our object is to find the 


function h(x). The principle of least squares consists of choosing h(x) so that 
the quantity 


E(Y — қхур 
is minimum. If (X, Y) is of the continuous type, this means that we want 


E(Y - HOY = f fb» = KORSE, у) dx dy 
= ло [о - кәр opo) dx 
to be minimum. Clearly this will be achieved if we minimize 
(Го - жолоо). 
From Theorem 3.2.8 it follows that we must have 


а) Wx) = E{Y|x}. 
A similar argument holds when (X, Y) is of the discrete type. 


THE PRINCIPLE OF LEAST SQUARES à 173 


Definition 1. The relation y = E(Y |x} is called the regression of Yon X, 
and x = E{X|y} is called the regression of X on Y. s pi 


In practice, one is frequently interested in approximating the relationship | 
between X and Y with the help of a straight line, ive 


y=axtb, yo T 
say. In that case, the principle of least squares requires us to choose a and ү 
Ь so as to minimize wW 
L-E(Y-aY- By. Ni 
We have 


L = EY! + d ЕХ? + № — 2a EXY + 2ab EX — 2b EY, 


where, of course, we have assumed that BY ? < оо, ЕХ? < co. We solve for . 
aand b from i ! 


Ob 2288 2EXY + 2b EX = 0, 
OL _ 2b + 2a EX — 2EY = 0; 
ab 
that is, we solve a and b from the so-called normal equations 
аЕХ + b = EY, 
а ЕХ? + bEX = E(XY). 
We get 
2 _ E(XY) - EXEY _ соу(Х, Y) 
x Е ЕХ? — (EX). var (X) 
Ha: cov (X, Y) 
6) : b= EY-EXTOÓ Wy 
Thus we have ; 
_ cov(X, Y) E) 3 
(4) y > EY = ary {x — EX}. 


We call (4) the line of regression of Y on X. Similarly the line 
_ cov(X, Y) y, Жү, 
(5) x-EX- Sy (» - EY] 


is the line of regression of X on Y. We wish to emphasize that one cannot 
simply solve (4) for (x — EX) to obtain (5), because the roles of y and x are 
reversed. Relation (4) was obtained with x as the causal variable, whereas _ 
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(5) is obtained by taking y as the causal variable and minimizing 
E(X — aY — BY. 

The procedure for fitting a polynomial of any order is similar. Thus, 
if we wish to fit 


у= а + bx + cx’ + dx, 
the normal equations are 
a+ bEX +cEX +d ЕХ? = EY, 

a EX + b EX? + c EX? + 4ЕХ* = EXY, 

a EX’ + b EX? + cEX! 4 4ЕХ* = ЕХ?Ү, 

a EX? + ЬЕХ* + сЕХ + dEX® = EX?Y, 
and so on. 
Definition 2. Let EX’, EY? exist. Then'the quantity cov (X, Y)/var (X) 


is called the coefficient of regression of Y on X. Similarly, cov (X, Y)/var (Y) 
is called the coefficient of regression of X on Y. We write 


cov (X, Y) cov(X, Y) 


Тху = Var (F) IRE Ait: gui 


Definition 3. If EX?, EY? exist, we define the correlation coefficient between 
X and Y as 


(9 p= 9000 YD _ E(XY) — EX EY 
SD(X) SD(Y) VEX? — (EX? W EY! — (EYY. 
Note that i 


P = Tey Trio 
and the sign of p is the same as that of cov (X, Y). 


t 4. We say that two rv's X and Y are uncorrelated if and only 
p 0. 


Clearly, p — 0 if and only if cov (X, Y) — 0. If X and Y are independent, 
then cov (X, Y) = 0 and X and Y are uncorrelated. We emphasize that, if 
X and Y are uncorrelated [and hence cov (X, Y) = 0], X and Y are not 
‘necessarily independent. 


Example 1. Let U and V be two rv's with the same mean and the same 
variance. Write X =U + V, Y-U-V.Then' 
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cov (X, Y) = KU? — V) - KU — V) KU + V) = 0, 


so that X and Y are uncorrelated but not necessarily independent. 


Some basic properties of the correlation coefficient are given in the fol- 
lowing theorems. 


Theorem 1. 


(a) The correlation coefficient between two rv's X and Y satisfies 


lel ste 
(b) The equality p = +1 holds if and only if there exist constants а and 
b such that 6 


P(Y =аХ + by = 1. 
Proof., By the Cauchy-Schwarz inequality 
[cov (X, Y)? var (X) var (Y) 
with equality if and only if there exist constants a and b such that 
P(Y = aX + b} =1. 


Corollary. The two lines of regression (4) ànd (5) coincide if and only if 
о = +10гр= —1. 


Theorem2. Let EX? < ©, EY? « co, and let U:= aX +, V = сї +4. 
Тһеп 8 

Pxy T Ey ys 
where p, у and py y» respectively, are the correlation coefficients between X 
and-Y and апа V. : 


Proof. The proof is simple and is left as an exercise. j P И 
Example 2. Let X, Y be identically distributed, with common pmf 
PIX = К} = +r k 24,2, N(N > 1). 


Then > 
Ex = cy = ЛТ?! ЕХ? = ЕҮ? = QUE DON +1), 


so that 
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M NER SR 
var (X) = var(Y) = 12 
Also, TS T 
‚ E(XY) = 1(EX? + EY! — E(X – YY} 
-» (N+ 1DQN41)_ EX- 008 
z 6 2 
Thus 
2 2 
heon (X, X) = + calle D are Wa уш 1) 
M уду YY, 
and 
LN? = 012 = E(X — Y}/2 
һу (N? — 1)12 
‚ oy 6E Yyo 
м1 


If P{X = Y) = 1, thenp = 1, and conversely. If P(Y = N .1.— X) = 1, 
then 


E(X — YY = EQX- N-1/ 
^ Sat DONE D c ANE, qva gg 


and it follows that Pxy =T 1. Conversely, if o, ,, = — 1, from Theorem 1 it 
` follows that Y — — aX + b with probability Ffor some-a »-0'and:sóme 
real number b. To find a and b, we note that EY = — аЕХ + bso that'b = 
KN + 2/2] (1 + а). Also EY? = E(b — aX}, which yields 


Ж ns vog oo oum d) EX + 2abEX — b? = 0. 
` Substituting for b in terms of a and the values of ЕХ? and EX, we see that 
а= yso that a= 1. Hence b= N +1, and it follows that Y = 
(N = X with probability 1. ^ ( 


peus 3. “Let (X, Y) have the joint pdf given by 
fe» - (3 if |y x.0«x«1, 


otherwise. 
* 
Eater ails E ^46) = { 0<х<1, 


. otherwise, 


Et 
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and 
1—у, 0<y<1, 
д? - [ies —1<у<0, 
0, otherwise, 


are the marginal pdf's. The conditional densities are given by 


[i5 0<у<] 


Ley: 
farvæl) =) 1 эы (0<х<1) 
| ТЕГУ? 1<у<0 
0, otherwise; 


1 
Рахо |х) т fæ 


0, otherwise; 


|у|<х, O<x<1, 


Е) | 2 dy = 0; 


and 
1 0<y<l, 
Е{Х|у) = [лер ese » 
XIX" -1<у5<0. 
Also, 
1 х 
cov (X, Y) = ff» dx dy = li xf» dy dx = 0, 

so that 


Pyy = 0: 


Example 4. Let (X, Y) be jointly distributed with density 


е), O<x<y< om, 
70) = үз otherwise. 


Then the marginal densities are 


hei E ERO 
AG) = otherwise; 
ye’, 0<у<о, 
LAY) = ie otherwise; 


‚ and the conditional densities are 
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Дхү(Х|у) = y 0cx«y 


хох) = e 0<x<y< о; 


E{X|y} = fore =}; 


Е{Ү|х} - [ре 77 dyis fe + ox)^*"du-x-l. 


It follows that the coefficient of regression of X on Y is 1, and that of Y 
on X is 1. Therefore 


hence 
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1, Let (X, Y) have the joint pdf 
LX» 0<х< 1,0 <у<1, 
fæ») = do. otherwise. 
Find the correlation coefficient between X and Y. 


2. Show that the acute angle, 0, between the two lines of regression is given by 
dux бү д 
ee he 
‘where оу, от are the standard deviations of X and Y, respectively, and p is the 
. correlation coefficient. 

3. Let (0, 9, P) be a probability space, and А, B, є ¥ with 0 < РА <1, 
0 < PB < 1. Define p(A, B) by p(A, B) = correlation coefficient between rv's 14, 
and Ip, where /4, Ig, are the indicator functions of A and B, respectively. Express 
(A,B) in terms of PA, PB and P(AB), and conclude that р(А, В) = 0 if and only 
if A and B are independent. What happens if A = B or if A = B°? 


(a) Show that 
&(A, B) > 0 <> P(A|B) > P(A) <> P(B|A) > P(B), 
and 
lA, B) 20 P(AIB) < PA = P(B|A) < PB. 
(b) Show that 


= P(AB) P(AcB:) — P(AB*) P(A*B) 
A, B)- рд РА PB PB? 


4 Let X, X» ---, X, be iid rv's, and define 
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Шо. LO dy 
д = п т (n — 1) 


Suppose that the common distribution is symmetric. pis the existence of 
moments of appropriate order, show that cov (X, 5?) = 0. 


5. Let X, Y be iid rv's with common standard normal density 

Дх) = a — 00 « X « oo. 
Let U = X + Y and V = X? + Y’. Find the mgf of the random variable (U, V). 
Also, find the correlation coefficient between U and V. Are U'and V independent? 
6. Let X and Y be two discrete rv's: 

P(X-x)]-p, PIX = x) = 1-Р; 
and 

P{Y =y) =P» Р(Ү= у) = 1- pe 


Show that X and Y are independent if and only if the correlation coefficient between 
X and Y is zero. 


7. Two fair coins, each with faces numbered 1 and 2, are thrown. ъй X denote 
the sum of the two numbers obtained, and Y, the maximum of the two numbers 
obtained. Find the correlation coefficient between X and Y. 


8. Let X and Y be dependent rv's with common means 0, variances 1, and 
correlation coefficient р. Show that 


E(max (X*, Y?) <1 + 1 =p. 
9. Let X,, X; be independent normal rv's with density functions \ 


мд = зу ер [- 1 (E78). co5€x«o;i—1 2 


Also let ^ { 

Z= X,cos0 + X,sin@ гапа. -W= X,cos0 — Хузїпб. 
Find the correlation coefficient between 7 and W, and show that 

ose s (ee): 

where р denotes the correlation coefficient between 7 and W. 
10. Find E(Y — aX — b)*, where a and b are given by 0) and (3), respectively. 
11. Let(X, X» ---, X,) be an rv such that the correlation coefficient between each 
pair X; X;,i # j, isp. Show that -(n—1)? € ps1. 


12. Let X, Xp, -, Xm» be iid rv's with finite second moment. Let S, — 275. Xj 
k 21,2, m + п. Find the correlation coefficient between S, and Smin — Sm 
where n > m. 
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13. Let f be the pdf of a positive rv, and write 


| fy) { 
ssp - [55 ifx > 0, y 0, 
0 otherwise. 


Show that g is a density function in the plane. If the mth moment of f exists for 

some positive integer m, find EX". Coinpute the means and variances of X and Y 

and the correlation coefficient between X and Y in terms of moments of f. 
(Adapted from Feller [29], 100) 


14. A die is'thrown n + 2 times. After each throw a + sign is recorded for 4, 5, or 
6, and a — sign for 1, 2, or 3, the signs forming an ordered sequence. Each sign, 
except the first and the last, is attached a characteristic rv that assumes the value 
1 if both the neighboring signs differ from the one between them and 0 otherwise. 
Let X, X; >, X, be these characteristic rv's, where X; corresponds to the 
(i + 1)st sign (i = 1, 2, ---, n) in the sequence. Show that 
» n я, 5п= 2 
Е {Ex} = east and var {5 x)= 16 a 

15. Let (X, Y) be jointly distributed with pdf / defined by f(x, у) = 4 inside the 
square with corners at the points (0, 1), (1,0), (— 1,0), (0, — 1) in the 
(x, у) — plane, and f(x, у) = 0 otherwise. Are X, Y independent? Are they uncor- 
related? ` 


16. Predict the length L of a word selected at random from the sentence 
IT IS TOO GOOD TO BE TRUE. 


Let X be the number of O's in the word selected. Find the joint pmf of X and L. 
"Compute the best least square predictor of L, given X, and compute the expected 
loss if you lose the square of your error. 


17. Let(Q, 9, P) be a probability space, and A and В € 5 be such that 
Р(А) = 4, P(B4)-1 P(A|B) = i 


Define Xw) = 1,(@), Y(o) = 1,(w) for all оє 0. Find EX, EY, var (X), var (Y), 
and the correlation coefficient between X and Y. Are X and Y independent? 


CHAPTERS 


Some Special Distributions 


5.4 INTRODUCTION 


In preceding chapters we studied probability distributions in general. In this 
chapter we study some commonly occurring probability distributions and 
investigate their basic properties. The results of this chapter will be of con- 
siderable use in theoretical as well as practical applications, We begin with 
some discrete distributions in Section 2 and follow with some continuous 
models in Section 3. Section 4 deals with bivariate and multivariate normal 
distributions, while in Section 5 we discuss the exponential family of distri- 
butions. 


5.2 SOME DISCRETE DISTRIBUTIONS 


In this section we study some well-known univariate and multivariate dis- 
crete distributions and describe their important properties. 


a. The Degenerate Distribution 


The simplest distribution is that of an rv X degenerate at point k, that is, 
P{X = kj = 1, and = 0 elsewhere. If we define 


if x.« 0, 


0 
(1) 48) = j^ vlt 


the df of the rv X is г(х — К). Clearly, EX' = k’, 1—1,2, +, and M(t) = е. 
In particular, var (X) = 0. This property characterizes a degenerate rv. As 
we shall see, the degenerate rv plays an important role in the study of limit 
theorems. 
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| b. The Two-Point Distribution 


We say that an rv X has a two-point distribution if it takes two values, x, 
and x;, with probabilities 


P{X = х) =p, P{X = x}=1-p, 0<р<1. 


We may write 


(2) X= ху у= + X2 Tac 
The df of X is given by 
(3) F(x) = pe(x — ху) + (1 — р) ex — xa). 
Also, 
e ЕХ! рд + (1-р) k=l, 
(5). M(t) = pe“? + (1 — р)е?% forall t. 
In particular, 
(6) EX = px, + (1 — p)x». 
and 
(7) var (X) = p(1 — p) (x1 — x. 
If x, = 1, x; = 0, we get the important Bernoulli rv: 
(8) P(X-1)-p  P(X-0)]-1-p, 0<р<1. 


For a Bernoulli rv X with parameter p, we write X ~ b(1, p) and have 
(9)  EX-p, var(X)-p(1—p) M(t)=1+4>p(e'—1), alt. 


Bernoulli rv's occur in practice, for example, in coin-tossing experiments. 
Suppose that P(H) = p, 0 < p < l, and P{T} = 1 — p. Define rv X so that 
XH) = 1 and X(T) = 0. Then P{X = 1} = p and P(X = 0} = 1 — p. Each 
repetition of the experiment will be called a trial. More generally, any 
nontrivial experiment can be dichotomized to yield a Bernoulli model. Let 
(0, У, Р) be the sample space of an experiment, and let Ae 5 with 
P(A) = p > 0. Then P(A‘)= 1 — p. Each performance of the experiment 
is a Bernoulli trial. It will be convenient to call the occurrence of event 
А a success, and the occurrence of 4“, a failure. 


Example 1 (Sabharwal [108]. In a sequence of л Bernoulli trials with 
constant probability p of success (S), and 1 — p of failure (F), let Y, denote 
the number of times the combination SF occurs. То find EY, and var (Ү,), 
let X, represent the event that occurs on the ith trial, and define rv's 
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if X, = S, Xi = Е, 


1 
АХ, Хы) -l (-12 а NEM 


otherwise. 
Then 
= E OG Xo) 
and 
= (n — 1)p(! - р). 
Also, 
EY; = E( Era. Xi) + Ef эрж JG, Xii) ft, Xj) 
=(n — Dd - р) (n— 2)(n — 3) p'(1 — př, 
so that 


var(Y,) = p(1 — р) (n — 1 + p(l — р) (5 — 3n)). 
If p = 1/2, then 


nal 
16 ^ 


EY, = 5 and var (Y,) = 


c. The Uniform Distribution on л Points 


X is said to have a uniform distribution on n points (xi, xo, ---, X,} if its 
pmf is of the form 


(10) PiX-x- X ПМ n. 
Thus we may write 


x Ў х,ал and, ЕФ) = T E — х), 


(11) EX = L È x, 
(12) ЕЕ E quo 
n jc 
and 
(13) var (X) = te s- haart До - ғ), 


if we write x = 7-1 Xin. Also, 
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LL s; 
(14) 5 М@-= in Z e for all t. 
‚ If, in particular, x; = i, i= 1, 2, ..., M, 
+1 n+ 1)(2n + 1) 
(15) Exo "+1, gy. e DOR D, 
S m-1 
(16) var (X) = 5. 


' Example 2. А box contains tickets numbered 1 to N. Let X be the largest 
number drawn in m random drawings with replacement. 


Then P(X < К} = (К | №)", so that 
P(X = ky = P(X < k} - PX < k — 1) 


oa 


ЕХ= М" Ў [e Ed (k 7ч у a (k zl y] 


Also, 


= NEN" — Eck = 19, 


4. The Binomial Distribution 


We say that X has a binomial distribution with parameter p if its pmf is 
given by 


(7) ре P(X- (gPa D k-012- moss. 
Since у рь = [p + (1-р) = 1, the риз indeed define a pmf. If X 


has pmf (17), we will write X ~ b(n, p). This is consistent with the notation 
for a Bernoulli rv. We have 


Fo) = È (7) eh - ps k). 
In Example 3. 2. 5 we showed that 


(18) EX = np, 

(19) ЕХ? = n(n — 1)р? + np, 
and 

(20) var (X) = np(1 — p) = пра, 


where q = 1 — p. Also, 
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Mt) = Ee(7)ra - »r* 
(21) =(q+pe'y" forallt. 


The pgf of X ~ b(n, p) is given by P(s) = (1.— p(1 — 5))", |s| x1. 

Binomial distribution can also be considered as the distribution of the sum 
of n independent, identically distributed b(1, p) random variables. If we toss 
a coin, with constant probability p of heads and 1 — p of tails, n times, the 
distribution of the number of heads is given by (17). Alternatively, if we 
write 


SPAN { 1 if kth toss results in a head, 
* (0. otherwise, 


the number of heads іп п trials is the sum S, = X, + X +. + X,. Also 
Р{Х = 1} =p, P{X, = 0} = 1-р, k-212, n. 


Thus 
ESQ з ЕХ, = пр, 
var (5) = Ў var(X) = np(1 — p) 
and 


ма) = f Ee'*i 
= (q+ pe)". 


Theorem 1. Let X; (i = 1, 2, ++, k) be independent rv’s with X; ~ b(n; p). 
Then S, = Xi; X, has a Ыт + т; + --- + пь p) distribution. 


k k 
Proof. Ms, (j= П М. x) = Ц (4 + pei 
and the result follows from uniqueness of the mgf. 


Corollary. If X,(i = 1, 2, ++, k) are iid rv's with common pmf b(n, р), then 
S, has a b(nk, p) distribution. 


Actually the additive property described in Theorem 1 characterizes the 
binomial distribution in the following sense. Let X and Y be two independent, 
nonnegative, finite integer-valued rv's and let Z = X + Y. Then Z is a bi- 
nomial rv with parameter р if and only if X and Y are binomial rv's with the 
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same parameter р. The “only if” Part is due to Shanbhag and Basawa [113] 
and will not be proved here. 


Example 3. A fair die is rolled n times. The probability of obtaining — 
exactly one 6 is n(4)(3)"™, the Probability of obtaining no 6 is (2)", and the 
Probability of obtaining at least one 6 is 1 — Gy: 

The number of trials needed for the probability of at least one 6 to 
be > 1/2 is given by the smallest integer л such that 


1-00)" > + 


ог 


n log (3) < – log 2. 
Therefore 


1082. 
"= торг © 3-8. 


Example 4. Here ; balls are distributed in n cells so that each of nr 
possible arrangements has probability n". We are interested in the probability 
D, that a specified cell has exactly К balls (k=0, 1, 2, :++, r). Then the distribu- 


Pr = P{X = k} = (X x iy^ k «0, 1, 2; yee) ie 


€. The Negative Binomial Distribution (Pascal or Waiting Time Distribution) 
Let (0, Y, P) bea Probability space of a given statistical experiment, and 
let Ae with P(A) = p. On any performance of the experiment, if 4 happens 
we call it a success, otherwise a failure. Consider a succession of trials of 
this experiment, and let us compute the probability of observing exactly г 
Successes, where r > 1 is a fixed integer. If X denotes the number of failures 
‚ that precede the rth Success, X + r is the total number of replications needed to 
Produce 7 Successes. This will happen if and only if the last trial results in a 
Success and among the previous(r + X — 1) trials there are exactly X failures. 
It follows by independence that 


PO LU О >. 
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Rewriting (22) in the form (see P. 2.7) 
бз) rx-32-(.)rt-9.  x-052-5a-1-P. 
we see that 
(24) (т) в-а = 
х=0 
It follows that 


Š Pix = х} =) 


Definition 1. | For fixed positive integer r > 1 and 0 < p < 1, an rv with pmf 
given by (22) is said to have a negative binomial distribution. We will use 
the notation X ~ NB(r; p). 


We may write 
нй a ы үүн то AE о 
х= Жин», and Р) = (ЕУ Jr ах - 9. 
For the mgf of X we have - 
2 ў үх aed r-1 — py e" 
ма) = & p = р) 


=P EY C es ? (4=1-р) 
(25) = ri- = деу” for де <l. 
The pgf is given by P(s) = ко - dn |з| < 1. Also, 


= тр! ic: е" 
(26) =p ql -4) = э 
Similarly, we can show that 
(27) var(X) = 4. 


If, however, we are interested in the distribution of the number of trials 
required to get r successes, we have, writing Y = X + r, 


a) PxrY-»-(Ql-)ra-»* ya nr tl 
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ЕҮ=ЕХ+г= P 
(29) | ; , 
var(Y) = var(X) = Ө, 
р 
апі 
(30) My() = (реу (1— деу" — forge < 1. 


Let X be a b(n, р) rv, and let Y be the rv defined in (28). If there are r 
of more successes in the first л trials, at most л trials were required to | 
obtain the first г of these successes, We have 


@ P(X2 т) = P(Y <n} 
and also 
(32) P(X <r} = P(Y > п). 
In the special case when г = 1, the distribution of X is given by 
(33) : PIX =x} = pg’, x-012.. 


An tv X with pmf (33) is said to have a geometric distribution. Clearly, 
for the geometric distribution, we have 


M(t) = p(1 — ge), 
ipi Xp 
(34) Pim 


moment when the mathematician discovers that a box is empty. At that time 
the other box may contain 0, 1, 2, ..., ү matches. Let us identify success 


po Probability that the mathematician discovers a box empty while 
the other contains г matches 
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event A that a 2 will show up before a 5. Let A; be the event that a 2 shows 
up on the jth trial (j = 1, 2, ---) for the first time, and a 5 does not show up 
on the previous j — 1 trials. Then PA = у>, PA; where PA; = i. 
It follows that 


des UNE m oT 
ra- ÈG) = 
Similarly the probability that a 2 will show up before a 5 or a 6 is 1/3, and 
so on. 


Theorem 2. Let Xj, Xp --:, X, be independent NB(r;; p) туз, i = 1, 2, ---, k, 
respectively. Then S, = X77 , X; is distributed as NB(r, + rz +. + rk; p). 


Proof. The proof is left as an exercise. 
Corollary. If X,, Xz; :::, X, are iid geometric rv's, then S, is an NB(k; p) rv. 


Theorem 3. Let X and Y be independent rv's with pmf's NB(r,; p) and 
NB(r;; p), respectively. Then the conditional pmf of X, given X + Y —1, is 
expressed by 

x 2 O ES n 


paresis reg! (naa) 
t 


If, in particular, гү = г; = 1, the conditional distribution is uniform on ¢ + 1 
points. 


Proof. By Theorem 2, X + Y isan NB(r; + rj; p) rv. Thus 
E Coe BAX = o AE £— 
P{X=x|X+Y=1}= PX*Y-1 
dies E: Lys юс 
(Th Ted -ov( ут uncus 
(Кр ea peed = py 
(Cr HEB oo) ; ^ 
АЧ 23 Lies 10, 1:288: 


(d ig 
+ t 


If r, = r = 1, that is, if X and Y are independent geometric rv's, then... 


(35)  P(X-x|X-Y-10- x20,1,2,75 151-0, 1, 2,7. 


ДЕГ 
t+1’ 
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Theorem 4 (Chatterji [12]). Let X and Y be iid rv's, and let 


P(X-k)2p,90  k=0, 1, 2, =. 


If 
G9 P(X= 1X +Y =} =P(X=t-1[X+¥=H=-71,, 120, 
then X and Y are geometric rv's. 
Proof. We have 
(37) p Miel Ira 
J k Pi-k 
[zy 
and 
(38) P(X-:- 1|X c Y 2 1) - Жай 1. 
i 2 Pk Pi-k 

It follows that 

pa) of BIN OT, 

Pi Po 


and by iteration p, = (ру/ру)ру. Since У); ур, = 1, we must have (pi/po) < 1. 
Moreover, 


М ЖА cl КАЙЫ 
| 7 Po (ip) 


so that (p;/po) = 1 — po, and the proof is complete. 


Remark 1. It is possible to drop the requirement of identical distribution in 
Theorem 4. (See Chatterji [12].) . 


Theorem 5. If X has a geometric distribution, then, for any two positive 
integers m and n, 


(09 . P(X > m+n|X > m) = P(X > п). 
Proof. The proof is left as an exercise. 
Remark 2. Theorem 5 says that the geometric distribution has no memory, 
that is, the- information of no successes in m trials is forgotten in subse 


quent calculations. 


The converse of Theorem 5 is also true. 
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Theorem 6. Let X bea nonnegative integer-valued rv satisfying 
P{X > m+n|X> m) = P{X > п) 


for any two positive integers т and л. Then X must have a geometric 
distribution. M 


Proof. Let the pmf of X be written as 
P(X-kj-p,  k-0,L2,-. 


Then ` 
P(X >п} = XM p, 
k=n 
and 
P{X > т) = у, рь = 9 say 
mil 
P{X > m+ n|X > т} = «cq = sing 

Thus 


nsn = 90-1 Я9т+1 = dudo 
where qo = Р{Х > 0) = pi + р + -= = 1— ро. It follows that 4 = 
(1 — po", and hence p, = g-i — ф = (1 — pof po as asserted. ^ 


Theorem 7. Let Xj, X2,...,X, be independent geometric rv's with 
parameters Pi, рә, --« p,, respectively. Then №, = min (X;, X», ---, X,) is also 
a geometric rv with parameter 


p=1- Йа -р). 
Proof. The proof is left as an exercise. 


Corollary. lid rv's Xj, X2, +, X, are NB(1; p) if and only if N, = min (Xy, 
X», +++, X,) is a geometric rv with parameter 1 — (1 — р)". 


Proof. The necessity follows from Theorem 7. For the sufficiency part of 
the proof let 


P{N, < k} = 1- P{N, > К) = 1 - (1 р)". 
But 
РІМ, < kj = 1— P(Xy > k, X, > k, X, > k} 
SPERN, 
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where F is the common df of Xj, Xz, ..., X,. It follows that 
[1 — F()] = (1 — py, 
so that P{X, > К} = (1 — р)*'!, which completes the proof. 


f. The Hypergeometric Distribution 


A box contains N marbles. Of these, M are taken out at random, marked, 
and returned to the box. The contents of the box are then thoroughly mixed. 
. Next, п marbles are drawn at random from the box, and the marked marbles 
are counted. If X denotes the number of marked marbles, then 


eo Pixma} -(2) СУ) 
Since x cannot exceed M or n, we must have 

(a x € min (M, n). 

Also x > 0 and N — M > n — x, so that 

(42) x > max(0, M + п- N). 

Note that 


&()6 9-67 
for arbitrary pnmbers а, b and positive integer n. It follows that 
zeit - (5) 25:22. 
It is easy to write the df of the rv X. 


Definition 2. An rv X with pmf given by (40) is called a hypergeometric rv. 
It is easy to check that 


(43) EX = M M, 
(44) EX? = NN wj HA 
: and 
(45) var (X) = 5 ®- M)(N — n). 


Mo 
Example 7. A lot consisting of 50 d is inspected by taking at random 


2 
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10 bulbs and testing them. If the number of defective bulbs is at most 1, the 
lot is accepted; otherwise it is rejected. If there are, in fact, 10 defective 
bulbs in the lot, the probability of accepting the lot is 


1 4 
(DS) . (0) 
(2) i (30); 
10 10, 

Example 8. Suppose that an urn contains b white and c black balls, 
Ь+ c = М. A ball is drawn at random, and before drawing the next ball, 
5 + 1 balls of the same color are added to the urn. The procedure is repeated: 
n times. Let X be the number of white balls drawn in п draws, X = 0, 


1, 2, ---, n. We shall find the pmf of X. 
First note that the probability of drawing k white balls in successive draws 


_is 
b (5+: b + 25 IHE Ds 
(x) + (+ 2 N+ (Е Edi 
and the probability of drawing k white balls in the first k draws and then 
n — k black balls in the next n — k draws is 


4) п (ves) Liens ccs ve een] 
dT sw] 


Here p, also gives the probability of drawing k white and n — k black balls 
in-any given order. It follows that 


47) P{X = k) = (л). 

An rv X with pmf given by (47) is said to have a Polya distribution. Let us 
write ; 
(48) Np = b, МІ – р) = с, and Na = 5. 


Then, with д = 1 — р, we have 


-p _{[п\р(р + @)-[Р зкана. pees k — Ya]. 
Puri) =(„)К ТП +а)-- 1 (n —1) 
Let us take s = — 1. This means that the ball drawn at each draw is not 
replaced in the urn before drawing the next ball. In this case а = — 1/N, 
and we have б 
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P(X-k)- (gee) ors - 24 ое 0 г (п-к 1)] 


-.[N- (n — 
Nq 
‘4 Ur (x k Р) m ) 
(49) MANCA 
(a) 
which is a hypergeometric distribution. Here 
(50) max (0, n — Ма) < k < min (n, Np). 


Theorem’3. Let X and Y be independent rv’s with pmf's b(m, p) and b(n, р), 
respectively. Then the conditional distribution of X, given X + Y, is hyper- 
geometric. 

Proof. The proof is left as an exercise. 


For a characterization of the hypergeometric distribution we refer the 
reader to Skibinsky [118]. 


g. The Poisson Distribution 


Definition 3. An rv X is said to be a Poisson rv with parameter A > 0 if its 
pmf is given by 


с —A ah 
(51) Piven) uf di реро. 


We first check to see that (51) indeed defines a’ pmf. We have 


-à Ñ зА 
» PU DE ЖМ 
Lr k} ke ee =. 
If x has the pmf given by (51), we will write X ~ P(A). Clearly 
X- Y Kk. 
pl Кх 


апі 
го) Bert Hex = p. 


The mean and the variance are P by (see Problem 3. 2. 9) 
бу _ ЕЖ = EX?=A42%, 
and 
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(53) var (X) = А. 
The mgf of X is given by (see Example 3. 3. 6) 
(54) Ee'* = exp {Ае — 1)}, 


and the pgf is given by P(s) = e 79^, |s|«1. 


Theorem 9. Let X;, X», ·-:, X, be independent Poisson rv's with X, ~ P(A,), 
k= 1, 2,:-:, п. Then 5, = X, + X2 + +++ + X, is a Р(А + Ao + +> + Ay) fV. 


Proof. The proof is left as an exercise. 


The converse of Theorem 9 is also true. Indeed, Raikov [93] showed that 
if Ху, Xo, +++, X, are independent and S, = X; X, has a Poisson distribu- 
tion, each of the rv's Ху, X», +++, X, has a Poisson distribution. 


Example 9. The number of female insects in a given region follows a 
Poisson distribution with mean A. The number of eggs laid by each insect . 
isa P(u) rv. We are interested in the probability distribution of the number 
of eggs in the region. 

e F be the number of female insects in the given region. Then 


ad 


> f= 0,1, 2, + 


Let Ү be the number of eggs laid gi each insect. Then 


P{Y=y,F Bee nre rF =f} 
к кдна! 


Thus 
Р{Ү = у} = кы. S ROC. сул, 
The mgf of Y is given by 
ма) = Eee Ee 
З "A he sie - yj 
= ap T Sup 


Theorem 10. Let X and Y be independent rv's with pmf's P(4;) and P(4), 
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respectively. Then the conditional distribution of X, given X + Y, is bi- 
nomial. 


Proof. For positive integers m and п, m<n, we have 


= Ат!) е "(n — ту!) 
e AO, + dyin! 


= AES e 


and the proof is complete. 


Remark 3: The converse of this result is also true in the following sense. If 
X and Y are independent nonnegative integer-valued rv's such that P(X-k) 
>0, P(Y =k} > 0, for k = 0, 1, 2, ++, and the conditional distribution 
of X, given X + Y, is binomial, both X and Y are Poisson. This result is due 
to Chatterji [12]. For the proof see Problem 13. 


Theorem 11. If Y ~ Р(А) and the conditional distribution of Y, given 
X = x, is b(x, p), then Y isa P(Ap) rv. 


Proof. The proof is left as an exercise. 


Example 10 (Lamperti and Kruskal [67]. Let N be a nonnegative integer- 
valued rv. Independently of each other, N balls are placed either in urn A 
with probability p(0 — P < 1) огіп urn B with probability 1 — p, resulting 
in N, balls in urn A and Ny = N — Ма balls in urn B. We will show that the 


IV's N, and Мр are independent if and only if N has a Poisson distribution. 
We have 


P(N, = a and Nj = b|N =a + b} = Н é Pea - py, 
where a, b, are integers > 0. Thus 


PIN, = a, № = b} - (T eh PIN- n), д1 рпа, 


If N has a Poisson (А) distribution, then 


— 
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м П ZÀ 3a*b 
P{N, = а, Ng = b) = er pg Gg 


PPS e) 


so that №, and Nz are independent. 
Conversely, if N4 and Nz are independent, then 


P{N = nj n! = f(a) (b) 
for some functions f and g. Clearly, f(0) # 0, g(0) # 0 because P{N, = 


0, Nz = 0} > 0. Thus there is a function h such that h(a + b) = f(a) g(b) 
for all nonnegative integers a, b. It follows that 


AC) = Л) 8@) = /(0) (1), 
h(2) = /(2)#(0) = f(1) 201) = /(0) 202), 


and so on. By induction, 


ro = o [QT a= so [A] 


We may write, for some aj, аз, А, 3 
Ла) = a, #00) = аге", 


et 


P{N =n} = 01 82 0 Bl 


so that N is a Poisson rv. 


h. The Multinomial Distribution 


The binomial distribution is generalized in the following natural fashion. 
Suppose that an experiment is repeated л times. Each replication of the 
experiment terminates in one of k mutually exclusive and exhaustive events 
Aj, Az 77, Ay. Let p; be the probability that the experiment terminates іп 
Aj, j = 1, 2, ---, k, and suppose that p;(j = 1, 2, ---, k) remains constant for 
all п replications. We assume that the п replications are independent. 

Let x, x», +++, x4-; be nonnegative integers such that x, -+xg+---+2x,-; S п. 
Then the probability that exactly x, trials terminate in A; = 1, 2,-+-,k—1, 
and hence that x, = n — (Xy + хо + + + ха) trials terminate in A,, is 
clearly 


n! к 
PAPARREAN ee E рі. 
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If (Xi, Xp, ---, Ху) is a random vector such that Y, = x, means that event 
A; has occurred x, times, x; = 0, 1, 2, ---, n, the joint pmf of (X,, Xs, ---, X) 
is given by 


65 Р(Х = х, = d Us Xp = ху) P 
T [prater cut бле 2 хь 
0 otherwise. 
Definition 4. An rv (Ху, X2, ---, X,-;) with joint pmf given by 
G9 ' P(X S ty Xz = x» at Xia = х1) 


n! P Жез 
Lx (ир xui Pe Ph 
РУ OK Жу En, 
0 . otherwise, 


is said to have a multinomial distribution. 


For the mgf of (Xj, X, «++, Хуу) we have 


Ма, to, 77, 14) = Безор 
n 


eee UHR 18h i PUT dil рр 


242, Xy! xq! +++ ху! 
(pret) (p,e'?y? -. 
= SEA ху! 
CIBC ee s (py ,e 1p 
| T TET i 101 
(57) 2 = (рей + ре? + + + рь лейт + р)" 


for all t, ta, ·--, ty-1 E 2. 
Clearly 
M(hy 0, 0, +, 0) = (pie! p; + — + py)” = (1 = pie pie) 
which is binomial. Indeed, the marginal pmf of each X; i = 1, 2, ---,k — 1, 
is binomial. Similarly, the joint mgf of X;, Xj, i, j = 1, 2, ---, k—1(i # j} is 
MOO, 0, ---, 0, tis 0, ---, O, tjs 0, +, 0)= [ре + pj" + (1 — p; — p) 
which is the mgf of a trinomial distribution with pmf 


п! n-x;—x ote 
69) fs ху) = lx) — x x9 ‘py Dp w^ Dy =1— рг= ру. 
Note that the rv's X}, X; ,---, X,_, are dependent. 
From the mgf of (X, X---, X,-1) or directly from the marginal pmf's we 
can compute the moments. Thus 
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(59) EX;=np; and  var(X)-mp(l— pj) ј = 1.2359, 
and for j = 1, 2, +-,k — 1, and i я j, 

(60) cov (X; Xj) = E((X, — np) (X; — пру} = = pz P; 

It follows that the correlation coefficient between X; and X; is given by 


idk 2 
61) py = Pip; TA íj-12 558100). 
(8) bus [ata 47-12 sk - ME) 
Example 11. Consider the trinomial distribution with pmf 

n! х зури хт 
эту x= yy PAP UT 


where x, y are nonnegative integers such that x + y < n, and pj, po, Ps > 0 
with p; + р; + рз = 1. The marginal pmf of X is given by 


P{X=x}= (")ка = р)" х= 0, 1, 2, ‚л. 
It follows that 


(2)... PY = |ә) 
А (n — x)! (2 Gs a if y20,1,2,-, n=% 


P(X-xY-y- 


y(n-x—y)lN- =p 

0 p i otherwise, 
which is b(n = x; pa/(1+p1)). Thus 
(63) Е{Ү|х} = (0 - x) 7B 

-n 
Similarly, 
: : Vase * 

(64) E(X|y) (n Lin eRe 


Finally we note that, if X = (Xj, X», ---, Ху) and Y = (Yi, Yo, 5 Y,) are 
two independent multinomial rv’s with common parameter (ру, Po, ·::, Ру), 
then Z = X + Y is also a multinomial rv with probabilities (p1, рг ---, Ру). 
This follows easily if one employs the mgf technique, using (57). Actually. 
this property characterizes the multinomial distribution. If X and Y are 
k-dimensional, nonnegative, independent random vectors, and if Z = X + Y 
is a multinomial random vector with parameter (Pi, Pa *-*, p,); еп X and 
Y also have multinomial distribution with the same parameter. This result is 
due to Shanbhag and Basawa [113] and will not be proved here. 
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PROBLEMS. 5.2 
1. (a) Let us write 
bs n, p) = (E) p — ру, 0, 2n 


Show that, as k goes from 0 to n, b(k;n, P) first increases monotonically and then 
decreases monotonically. The greatest value is assumed when k — m, where m is 
an integer such that 
(п +1)р-1<т<(п+1)р 
except that b(m — 1;n, р) = Ыт; п, p) when т = (n + 1)p. 
(b) Ifk > np, then 
5 (К +11 р) . 
Р(Х> k} < ki К ere psg эки 1p? 
and if k < np, then 
" (n — + 1)p 
P(X <k} < biki n P Тур" 


2. Generalize the result in Theorem 10 to n independent Poisson rv's, that is, if 
Xy Xa» «+, X, are independent rv's with X; ~ P(A), i = 1,2, ---, n, the conditional 
> distribution of X,, Х,, =, X,, given У7-1 X; = г, is'multinomial with parameters 
t, A Yy As is An Eyre | 

3. Let X, X, be independent rv's with, ^п), i=1, 2. What is the pmf 
of X, — X, m? 

4. А box contains N identical balls numbered.1 through, N. Of these balls, 7 are 
drawn at a time. Let X,, X» ---, X, denote the numbers on the n balls drawn. Let 
S, = X12 X, Find var (S,). 

5. From a box containing N identical balls marked 1 through N, M balls are 
drawn one after another without replacement. Let X; denote the number on the 
ith ball drawn, i = 1, 2,---, M, L < M < N. Let Y = max (Xj, Xz» ---, Xy). Find 
the df and the pmf of Y. Also find the conditional distribution of Xy, X» ---, Хм, 
given Y = у. Find EY and var (Y). 

6. Letf(x;r, p), x = 0, 1, 2, --., denote the pmf of an NB(r; p) ту. Show that the 
terms f(x; r, p) first increase monotonically and then decrease monotonically. 
When is the greatest value assumed? © " 


7. Show that the terms 


T ou ЕЧ note) erdum prts qoa илай» 
of the Poisson pmf reach their maxima when k-is the largest integer < 4. + 


'&. Show that ims , y 
(ma - py- +e qniformiy in © 
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as поо and p-+0; so that np = А remains constant. 
(Hint: Use Stirling’s approximation, namely, л! ~ 4/ 27 п"*1/? e-" as поо.) 
9. A biased coin is tossed indefinitely. Let p (0 < p < 1) be the probability of 
success (heads). Let Y, denote the length of the first run, and Y,, the length of 
the second run. Find the pmf's of Y, and Y, and show that EY, = q/p + pla, 
EY, = 2. If Y, denotes the length of the nth run, л > 1, what is the pmf of Y,? 
Find ЕҮ,, 
10. Show that 

мүт (Np N= P) (7) pa -k 

Ср, 

Si GO e) ON ) c nott o 
as N—oo. 


11. Show that 
(olco = реа. 


as p > 1 and r — оо in such a way that r(1 — p)=A remains fixed. 


12. Let X and Y be independent geometric rv's. Show that min (X, Y) and 
X — Y are independent. 


13. Let X and Y be independent 1v’s with рт Р(Х = k) 2p, P(Y = К) = дь 
Kk = 0, 1, 2, =, where py, g > 0 and корг = Eico g = 1. Let 
P(X=k|X+ Y- 1 = (0) ааа а) Oskst. 
" 


MOBY gy a вуй, 


Then a, = a for all г, and 


noii pe 
where B = a/(1 — a), and 0 > 0 is arbitrary. 

(Chatterji [12]) 
14. Generalize the result of Example 10 to the case of k urns, k > 3. 
15. Let (X, X» +, X,.,)have a multinomial distribution with parameters 
D, Py Pi s Рус, Write ^ 


Ys nac, 
i=l 
where p, = 1 — р, — --- — Pr- and X, — n — X, — + — X,.,. Find EY and 


var (Y). 
16. Let X,, X, be iid rv's with common df F, having positive mass at 0, 1, 2, --- 
Also, let U = max (X;, X;) and V = X, — Xz. Then 
P(U-j, V = 0) -P(U-j) P(V = 0) 
for all j if and only if F is a geometric distribution. 
(Srivastava [122]) 


[ 
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17. Let Xand Y be mutually independent rv's, taking nonegative integer values, 
Then t 

P(X x n) -P(X*Yxmn) = aP(X + Y =n) 
holds for n = 0, 1, 2, --- and some a > 0.1 and only if 


Pram Lele ae nat han 


(Puri [92]) 
(Hint: Use Problem 3.3.8.) 


18. Let X, X, --- be a Sequence of independent. &(1, D) tV's with 0 <p <1, 
Also, let Z, — ЛУ A Where N is a P(X) rv which is independent of the X's, Show that 2, 
and N – 2, are independent. 


19. Prove Theorems 5 and 7. 
20. Prove Theorem 8. 
21. Prove Theorem 1. 
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In this section we study some most frequently used absolutely continuous 
distributions and describe their important properties. 1 


a. The Uniform Distribution (Rectangular Distribution) 


Definition 1. An rv X is said to havea uniform distribution on the interval 
la b -ocacbc 90, if its pdf is given by 


1 
@ mfia ER 
0, otherwise. 


We will write X ~ U [a, b] if X has a uniform distribution on [a, b]. 
The end point a or b or both end points may be excluded. Clearly, 


fot 


So that (1) indeed defines a pdf. The df of Y is given by 


0, x«a, 
.Q ro Ee a<x<b, 
1, bsx; 
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9). Exe em Bie e ad k > 0 is an integer; 
e ‚ ara. 20, 
(5) ма). = таи (2% е, E 


Example 1. Let X have the density given by 


fle, wm>x>0, A27 
IGIS is otherwise. 
Then 0 et 
x 3 
EG -{ = ee, 3:50) 


Let Y = F(X) = 1 — е ?*, The pdf of Y is given by 


fr) = i il 5 fepe? ош у=, 
= 1; 0<у<1: 
Let us define f(y) = 1 at y = 1. Then we see that Y has the density func- 
tion 


_ fl, 0 Syst, 
ЖО) = te otherwise, 


which is the U[0, 1] distribution. That this is not a mere coincidence is shown 
in the following theorem. 


Theorem 1 (Probability Integral Transformation). Let X be an rv with a 
continuous df F. Then F(X) has the uniform distribution on [0,1]. 


Proof. The proof is left as an exercise. ` 


The reader is asked to consider what happens in the case where F is the 
df of a discrete rv. In the converse direction the following result holds. 


Theorem 2. Let F be any df, and let X be a U[0, 1] rv. Then there exists a 
function А such that A(X) has df F, that is, 


(6) P{h(X) < x} = F(x)  forallxe(—oo, со). 


Proof. If F is the df of a discrete rv Y, let 
P{Y = yi] = Pp k 1,2, 7. 
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Define Р as follows: 
у if0<x< р 
h(x) =; ifpi < x< pi + pa 
Then Н š 
P(h(X) = yi} = P{0 < X <р} =p, 
PEUX) = yo} = Pip < X < p + рә} = po, 
and, in general, 


PUK) = ууу =p, k-1452-, 


Thus A(X) is a discrete rv with df Е. 
If F is continuous and strictly increasing F-! is well defined, and we take 
WX) = F-X). We have 


P{AX) « x) = P(F^(X) < x} 


= P(X sx F(x)) 
= F(x), 
as asserted. 
In general, define 
(7) Еу) = inf (x: F(x) > у}, 
and let A(X) = F-\(X). Then we have 
(8) {Е (у) € x} = (y < F(x)}. 


F-\(y) < x implies, that, for every є > 0, y < F(x + e). Since e's 0 is arbi- 
trary and F is continuous on the right, we let € 0 and conclude that 
YS F(x). Since y « F(x) implies. F-Xy) < x by definition (7), it follows 
that (8) holds generally. Thus 


P(F-X) < x} = P(X < FQ) = Fo 2 * 

Theorem 2 is quite useful in generating samples with the. help of the 
uniform distribution. 
Example. 2 Let F be the df defined by 


m х<0 E 
F(x) = (1 -e*, х> 0. 


Then the inverse toy-l—-e*x20isx- —102(1 — y), 0< y < 1. Thus 
Ay) = — log (1 — y), 
and —log (1 — X) has the required distribution, where X is a U[O0, 1] rv. 


\ 
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Theorem 3. Let X be an rv defined on [0, 1. If P(x < X < y} depends only. 
on the length y — x for all 0 < x < y < 1, then X is U[0, 1].(P{X=0}=0.) 


Proof. Let Px < X < у} = f(y — х); then f(x + y) - PO < X < x + y) 
= P{0 < X < x} + P(x < X < x + y) = f(x) + f(y). Note that f is con- 
tinuous from the right. We have е 


Лх) = ЈО) + 70), 


so that 
f0) = 0. 
Also 
0 = fO) = f(x — x) = f(x) + fC х), 
so that 


Kx =- Јо). 
We will show that f(x) = cx for some constant с. It suffices to prove the 
result for positive x. Let m be an integer; then 


fx +x + + x) = f(x) + + + f(x) = тух). 
Letting x = n/m, we get 


so that 
£(2) - 159 = тло), 
for positive integers n and m. Letting f(1) = c, we have proved that 
~ (f(x) = ex 


for rational numbers x. 

To complete the proof we consider the case where x is a positive 
irrational number. Then we can find a decreasing sequence of positive 
rationals x;, ху, --- such that x, — x. Since f is right continuous, 


f(x) = lim f(x.) = lim ex, = сх. 


Now, for 0 < x < 1, 
F(x) = P{X < 0} + P{0 < X < x} | 
= Е(0) + P{0 < X < x} 


= f(x) 
= CX; Osx<il. 
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Since F(1) = 1, we must have c — 1, so that 


Fx)-x б<х<1. 
This completes the proof. 


b. The Gamma Distribution } 
For a > 0, l(a) is defined by 


- = жр ms 
@) го) = fe dx. 
In particular, if @ = 1, Га) = 1. If a » 1, integration by parts yields 
(10) Га) = (a — 0 fea = (а а 1), 


If = nis a positive integer, then 


(11) T(n) = (n — 1)t. 

Let us write x = у/8, 8 > 0, in the integral in (9). Then 
SUN re ey 

(12) T(a) = f T Y? dy, 

80 that 
1 -1 THP р, _ 

(13) [те Уе» dy = |, 

Since the integrand in (13) is positive for y > 0, it follows that the function 

à 1 1 j-w 
(4) #0) = [re eae Fe 
0, y s 0. 


defines a pdf for a > 0, 8 > 0. 


Definition 2. An rv Y with Pdf defined by (14) is said to have a gamma 
distribution with parameters а and 8. We will write X ~ G(a, 8). 


The df of a G(a, 6) rv is given by 


0, х х0, 
Ag en tree [ә e Y dy, 0 «x. 


The mgf of X is easily computed. We have 
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aul? 1 уа 
мо = eye f 7 a 


a ро yal оу 

Goal E re ttg 
(16) = (1> pry, f< F 
It follows that 
(17) EX = M'()| 120 = «В, 
(18) ЕХ? = M"@| 1-0 = a(a+ 16", 
so that 
(19) var (X) = af. 


Indeed, we can compute the moment of order n directly from the density. 


We have 

T 1__,[°„-у8 atn- 

iota nri 

=p Г(« + n) 

TAA)...” 
(20) 2 = Ва + n — 1) (а + n= 2) a. 
The special case in which а = 1 leads to the exponential distribution with 

parameter B. The pdf of an exponentially distributed rv is therefore 


-1 ,—x/B 
QD Kb e etd 


otherwise. 


Note that we can talk of the exponential distribution on (— oo, 0). The pdf 
of such an rv is 


Фей, We {Bf оаа А 
Clearly, if X ^ G(1; В), we have 

Q3) 5 EX =n! р 

(24) ЕХ= В апі var(X)= f 
Q5. M) = (1—8)* for t< g^ 


Another special case of importance is that in which a = n/2, n > 0 "d 
integer) and 8 = 2. 


Definition 3. An rv X is said to have a chi-square distribution ана. 
bution) with п degrees of freedom if its pdf is given by... 4 
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1 ”х/2 „я/2-1* p 
Qe fe = [reme Г) (0 <x « c, 
S x x 0. 


We will write X ~ у? (n) for a 5^ rv with m degrees of freedom (d.f.). [Мое 
the difference in the abbreviations of distribution function (df) and degrees 
of freedom (d.f.).] ў 


If X ~ X'(n), then 


(27) ЕХ = п, var (X) = 2n, 
_ 2"T'((n[2) + k] 
(28) ЕХ* = TP 
and " 
(29) M()--2)"^. for 1 <4. 


Theorem 4. Let X,, X, ---, X, be independent rv’s such that X; ~ Glan B), 
Ј=1, 2, ---,n. Then S, = iX, is a G (377.15, B) гу. i 


Corollary 1. Let X;, X», ++, X, be iid rv's, each with an exponential dist- 
ribution with parameter B. Then S, is a G(n, B) rv. 


Corollary 2. If Xy, X, =, Ж, are independent rv's such that Xj xe» 
J = 1,2, --., n, then S, isa x? (Xi, rj rv. i 


The proof of Theorem 4 is simple, and Corollaries 1 and 2 follow im- 
mediately. ; 


av 


Theorem 5. Let X ~ U (0, 1). Then Y= —2 log X is о). 


con 
Corollary. Let Xj, Xz, «++, X, be iid rv's-with common distribution U(0, D 
Then —2 77 log X, = 2 log (TIT, X) is y7(2n), i 


For proof see Example 2.5.5. The corollary follows from Coro! 2 to” 
Theorem 4, = 
Theorem 6. Let X ~ G(o, B) and Y ~ С(а», B) be independent гуз. 
X + Y and X/Y are independent. 


Corollary. Let X ~ G(a;, B) and Y ~ G (œz, B) be independent rv's. 
X4 Y and ХЇ(Х + Y) are independent. P iw 


Proof. The proof is straightforward and is left as an exercise, ^ 374 


К. 
" 


i 
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The converse of Theorem 6 is also true. The result is due {о Lukacs ie 
and we state it without proof. ; 


Theorem 7. Let X and Y be two nondegenerate rv's that take only positive . 


values. Suppose that U = X + Y and V = Х/Ү are independent. Then X'and 
Y have gamma distribution with the same parameter 6. 


ч 
Theorem 8. Let X ~ С(1, 8). Then the rv X has “no memory," that is, 
(30) P(Xr-Eis|X > s) —P(X»r), 
for any two positive real numbers r and s. 
Proof. The proof is left as an exercise. — 


The converse of Theorem 8 is also true in the following sense. 


Theorem 9. Let F be a df such that F(x) = 0 if x:« 0, F(x) < lifx » 0, 
and 1 


І +9) р 
(31) кау FO 1 (x) ; for all x, у> 0. 
Then there exists a constant 8 > 0 such that 
(32) 1 — F(x) = eT", » x>0. 


Proof. Equation (31) is equivalent to 
g(x + у) = g(a) + 80) 


if we write g(x) = log {1 — F(x)}. From the proof of Theorem 3it is clear 
that the only right continuous solution is g(x) = cx. Hence F(x) = 1— е", 
ox > 0. Since F(x) > l'asx * ©, it el that c « 0 and the proof i is 
complete. 


mo nie 4 JO med 
Theorem 10. Let X, Xa —, Xy Be lid-ri/s: Then ^X, 4: G(l; nBy i291, 
2,775, п, if and only if N, = min {Xis X», ---, X,} is G (1, B). 
b * С) 
ИП The petat is са as an exercise. 
MALS J " 
Note that, if Xj, X» et X, are independent with Y; ~ Mena m a, 
2, ++, n, then №, E. TEM В: Эту. ae 


Theorem 11 (Desu 03). Let Xj, АХ. Xn i iid АНИГЕ тү "s with 
n ogaal ИЭР? y ум \ 
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common df F, and let N, = min (Ху, X2, ---, Х,). Then X, and nN, are 
identically distributed if and only if 

(33) F(x) = 1l-e foxz0, 
"where 4 is а positive constant. 


Proof. If 
toe if x > 0, 

Fe) = (0 if x <0, 

then 
(м) Fa) = PN, < 2 z1 -{ї 2 zy 
-21-e-F(y). 

Conversely, let 
(35) Е,у(у) = Ебу): for real y. 


Let G(y) = 1 — F(y). Then 


iro [ier 


(36) G(ny) = G'(y) . for real y and all integers n > 1. 


and it follows that 


Thus G(0) = G"(0), and we must have G(0) = 0 or G(0) = 1. We show 
that G(0) = 1. 

Let yo < 0 so that G(yo) < 1. Then G"(yo) > 0 and G(nyo) ^ 1 as n — со, 
Which is a contradiction. Thus, for all y < 0, .G(y) > 1, and € 
G(y) = 1 for у < 0. Since F is nondegenerate, we cannot have G(y) = 
for all PENY so that С(уу) > 0 for some y; > 0. If. у = 0, ni” 
G(0) > 0, and hence G(0) = 1. On the other hand, if G(y;) > 0 for y, > 0, 
then G(y) > 0 for 0 < y € у, since G is nonincreasing. Since G is right 
continuous, it follows that G(0) = 1. Thus 


-1 {сгу<0 
3 ü 
e» со >0 0г0<у<у. 
Tov that there is a constant у, > у; for which G(y2) = 0. Then, from 
(38) Gy) = (2) -0 fonzl 


Haider: sto : Д 1051 
Ip follows that G(y) = 0 for у > узи; therefore, for sufficiently large л, 
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G(y) = 0 when y2/n x y < Yı which contradicts (37). Hence we have shown 
that 


(39) G(y)> 0 for y > 0. 
From (36), we have 
em a= o(n-2) en) 
so that 
(40) e(7) = G""(1) . forn, m > 1 (n,'m integers). 


Since G(y) and {С(1)) are both nonincreasing and coincide for positive 
rational numbers y, and since (G(1))" is continuous, it follows that 


(41) б(уу = {GUP = esc forall real y > 0. 
Since F is a nondegenerate df, 
(42) lim G(y) = 0, 
yo 
so that G(1) < 1. Hence 
e! oy» 0 
(43) 6 ={ РУ 


where А = —log С(1). 


Corollary. Let У be а nondegenerate rv with df Н, where H(0) = 0. Also, 
let. Yi У, «++, Y, be iid rv's with common df H. Then Y and M, = max 
(YT, >, Y7) have the same df Н (for each integer n > 2) if and only if 


Н(у) = у 0<у<1 for some 2 > 0. 
For proof take X = — log Y in Theorem 11. 


The following result describes the relationship between exponential and 
Poisson rv's. 


Theorem 12. Let Xj, Y; -« be a sequence of па rvs having common 

. exponential density with parameter 8 > 0. Let S, = 25, , X, be the nth 
partial sum, л = 1, 2, ---, and suppose that г> 0. If Y = number of 
S, € [0, 1], then Y is a P(t/B) rv. 


Proof. We have 
P(Y20)-P($» i} = + f on dx = e, 
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so that the assertion holds for Y = 0. Let n be a positive integer. Since the 
X;'s are. nonnegative, S, is nondecreasing, and 


(44) P{Y = п) = P{S, < t, Sp > th. 

Now ; 

(45) . PIS, < 1} = PIS, < 1, S» 0) + P(Ssa < 1}. 
It follows that 

(46) s P(Y = nj — P(S;x t) — P(S x 1}, 


and, since S, ~ G(n, 8), we have 


йг! veg P EE IS? Е 
PET S хе 90 dx [ктт peer е de. 
ўе В 
ECE 


as asserted. 


Theorem 13. If X and Y are independent exponential rv's with parameter 
B, then Z = X/(X + Y) has a U(0, 1) distribution. 


Proof. The proof is essentially contained in Problem 4.4.2. 


Note that, in view of Theorem 7, Theorem 13 characterizes the exponen- 

„tial distribution in the following sense. Let X and Y be independent rv's that 
“are nondegenerate and take only positive values. Suppose that X + Y and 
X/Y are independent. If Х/(Х + Y) is U(0 1) Xand Y both have the 
exponential distribution with parameter B. This follows since, by Theorem 
7, X and Y must have the gamma. distribution with parameter B. Thus 
X/(X + Y) must have (see Theorem 15) the pdf f 


Eu ана E 
“and this is the uniform. density on (0, 1) if and only if ay Slo, = ls Thus X 
Y both have the С(1, B) distribution. ...... а р 


eat 


pitt: 


"Theorem 14. Let X be a P(A) rv. Then 


an PIX < K) = рр [езд eiw М 


expresses the df of X in terms of an incomplete gamma function. 
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Proof. d рожу ES curet 
2. ES =à 
KY ^ 


and it follows that 
P(X < К} = x he *х* ах, 


as asserted. 
An alternative way of writing (47) is the олок 
P{X < K} = Р{Ү > 22}, 
where X ~ P(A), and Y ~ у 2(2K + 2). 
с. The Beta Distribution 
For a > 0, B > 0, B(@, B) is defined by 


(48) B(a, B) = ni x1 — xf dx. 

It follows that 
xci = xh! 

(49) у= | Р 
0, otherwise, 


defines a. pdf. 


Definition 4. An rv X with pdf given by (49) i is said to have a beta distribu. 
tion with parameters a and 8, a > 0, 8. 0. We will write X ~ Ba, 9 for 
a beta variable with density (49). 


The df of a' B(a, В) rv isigiven by 


0, x<0, 
(50) F(x) = (me ВГ! f. yd — yf dy, © 0<х<1, 
5 Shs 


If n is a positive ju then 


EX" =; ГОЛ t xe — xy dx 


Bn + БУУ s Tn + 0) I T) 
(61) SN ot Teo o eec D tars) 
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using the fact that B(a, 8) = Tere. In particular, 


(52) He =: 2 8 
and 
(53) var(X) = ав 


«+ а +8 +1) 
For the mgf of X ~ B(a, 8), we have 
imt lig qp Nea 
(54) Мо у fe х1! dx. 
Since moments of all order exist, and E|X d « 1 for all j, we have 
LEX 


j! 
ГА Г(о +j) a+ 8) 


A 
ATO) авта 


(55) = 


"Remark 1. Note that in the special case Where a =  — 1 we get the uni- 
form distribution on (0, 1). 


Remark. 2 If X isa beta rv with parameters a and B, then 1 — X is a beta 

variate with parameters В and а. In particular, X is B(a, а) if and only if 

1 — X is B(a, a). A Special case is the uniform distribution on (0, 1). If X 

and ] — X have the same distribution, it does not follow that X hás to be 
(а, а). All this entails is that the pdf satisfies 

a Л) = Л х) O<x<1. 

Таке : vat 


I) = papi Bay IA = PHO ок. 


Example3. Let X be distributed with pdf . 
b x1 — x), 1, 
fe) = i (1 — x), 0cx« 


otherwise. a 
Then X ~ B(3, 2) and { 
Ех" Ги + DIS) _ 4! (me2! 9 — 12 
ТЗ)Г(п+ 5) 21 (m4! -n+ An F 3)’ 


EX=2, — var(X)= a =O 
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_ аш GED 4! Pos 
wo» Eg eer рылу 


а D Ue 

AGI JP 

and f j 

P(2«X«.5)- ze L)dx 
= .023; 


Theorem 15. Let X and Y be independent С(01, 8) and G(ap, B), respec- 
tively, rv's. Then X/(X + Y) is à В(ау„ a2) тү. AU 


Proof. The procf is left as an exercise. 


Theorem 16. Let X ~ Blan Bi) and. Y, Blas, 02), and let X and Y be 
independent. Suppose that a; = a2 + pz Then XY is a B(az, B + B) tV. 
If a; = a + Bi; then XY is B(ay, By + Вә). 


Proof. For the'pdfiof ХҮ = Z, say, we have 
күкле} СЩ] — xy? Cle — Geor ay 
mof sa e B(ao, Вг) 2 7 


И eee 1 з үбү щ yen? 
“Ба, б) Bn Б Га 9а 


E : 1-zfj m eu ate 
ph үл, T Nee diede 


where we substituted и = (x ~ z)/(1 — 2) in the integral. Thus 


f 22-91 = gatha B(x, В) 
fo Bü 8) Blan Ba) 


sz e aye 
Blan бу + В) - 


as asserted. The second assertion follows similarly. 


We remark that a partial converse to Theorem 16 holds; that is, if XY is 
B(a, B), and X and Y are independent B(a;, 6) and B(ao, Bz), respectively, 
we have B = f; + ba and a = aj ога = а. This is easy to see, since, for 
allt > 0, EX! EY'= E(XY), so that i 

ay t+ Brtt\( o + 8+1}: atBtt ; 
( i5 X LH )- ijj. holds for ай! 2.0. 
Cross-multiplying and equating coefficients of tt, к= 0, 1, 2, we get 
B =. В, + В; and either a = (OF a as. (See Ramachandaran [94).) 
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Theorem 17. Let Xj, X2, ---, X, be independent beta rv's with X; ~ B(a,, B;), 
i= 1,2, +, К. Let a; = пр + inp 1 = 1, 2, =, k — 1. Then Пі, X; is 
also a beta rv with parameters (ay, Bj + 62 + --- + By). 


Let Xj, X; >, X, be iid rv's with the uniform distribution on [0,1]. 
Let Xm be the kth-order statistic. 


Theorem 18. The rv X has a beta distribution with parameters a =k and 
B =п-К+1. 


Proof. iis X be the number of X;'s that lie in [0, t]. Then X is b(n, t). We 
have 


Р{Х € t} = P(X > k} 
-E)rna -w. 
Also 
Loud 2h = EC j) Ged «a = (ир а = rnm 
ts Er Foe - 97 -2(* 1 easy 


Ж i De (10,3 ут, 


On integration, we get 3 

PXS h= (EL »f xE = xy dx, 
as asserted. 
Remark 3. Note that we have shown that, if X is b(n, p), then 
: (56) 1 PX «Een D fitta = x" dx, 
which expresses the df of X in terms of an incomplete beta function. 
Theorem 19... Let X, i з AX, be independent rv's, and write M,, -max(X;, 
Xs --, X,). Then Xi, X», +++, X, are iid. В(а, 1) tv's if and only if M, ~ 
oe 1). 
Proof, The proof is is simple. 


Есбай sitübacorilla to Theorem н. 
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d,, The Cauchy. Distribution 


Definition 5. An rv X is said to have a Cauchy distribution with parameters 
рапа 0 if its pdf is given by 
(57) Soy Bae a -o<x< 0, р> 0: 
We will write X ~ €(u, 0) for a Cauchy rv with density (57). 
We first check that (57) in fact defines a pdf. Substituting у = (x — б)/д, 


we get 


4 2 ануу 
2 = (tan ууу =1 


'со -l со. 
foe SET 


The df of a € (1, 0) rv is given by 


(58) Fo) = + +L tants, + C o «x < о. 


Theorem 20. Let X be a Cauchy rv with parameters и and 0. The moments 
of order « 1 exist, but'the moments of order > 1 do not exist for the rv X. 


Proof. It suffices to consider the pdf 


and, letting z — 1 Га +9) in the integral, we get 
E|xp = al Lp Tjasa (zem 


which converges for a < 1 and diverges for æ > 1 (Widder [137], 374). This 
completes the proof of the theorem. 


It follows from Theorem 20 that the mgf of a Cauchy rv does not exist. 
This creates some manipulative problems. 


Theorem 21. Let X ~ € (un бу), and Y ~ € (u, 0z) be independent rv's. 
Then X + Y isa (ш + pz 0 + 62) rv. 


Proof. For notational convenience we will prove the result in the special 


t 


У 
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case where и = i; = 1 and б, = 0; = 0, that is, where X and Y have the 
common pdf : 


CIS UT эй 
о) TR 00 « X.« оо. 


The proof in the general case follows along the same lines. If Z = X + Y, 
the pdf of Z is given by 


шер лде т ione 1 
л |та 1+ (z— xy en . 
Now 
1 record E of ie ay | „лг ] 
UAUA] 2@ 442 16x 1xG-xy 102) 


so that i 


1 Š Y о 
f(z) = Е EF [z log 1 ii 22 tan! х+ 22 tan \x—z) 
1 2 
EFP EE H -0<z<0. 


It follows that, if X and Y are iid ¢ (1, 0) rv's, then X+ Yisa € (2,0) rv. 


Corollary. Let Xj X» ++, X, be independent -Canchv . ry’s, 
X, ~ (иь б), k = 1, 2, +++, n. Then 5, = Zi Xis а&(у дь Ei Oa) rv. 


In particular, if Xj, X», ---, X, are iid (1, 0) rv's, п! S, is also a @(1, 0) 
rv. This is a remarkable result, the importance of which will become clear 
in Chapter 6. Actually this property uniquely characterizes the Cauchy distri- 
bution. If F is a nondegenerate df with the property that n ! S, also has 
df F, then F must be a Cauchy distribution. (See Thompson [130], 112.) 

The proof of the following result is simple: 


«|I,» 0) rv. 
Corollary. .. X is &(1, 0) if and only 1/X is (1, 0). 


Theorem 22... Let X be E(u, 0). Then A/X, where 2 is a constant; isa 


We emphasize that, if X and 1/X have the same pdf on(— ©, оо), it does 
not follow! that X is &(1, 0), for let X be an rv with pdf 


f9-1.if|x| <1, 


Menon [80] has shown that we need the condition that both X and.1/ X be stable to 
` conclude that X is Cauchy. 
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2b it [fs 


4x. 
Then X and 1/X have the same pdf, as can be easily checked. 


Theorem 23. Let X be a U(— 2/2, 2/2) rv. Then Y = tan X is a Cauchy rv. 


Many important properties of the Cauchy distribution can be derived from 
this result. (See Pitman and Williams [89].) 


e. The Normal Distribution (the Gaussian Law) 


One of the most important distributions in the-study of probability and 
mathematical statistics is the normal. distribution, which we will examine 
presently. 


Definition 6. An rv X is said to have a standard normal distribution if its 
pdf is given by 


(59) KT A cocco 


We first check that f defines a pdf. Let 
Iz fs «e 9 dx, 


0 < е2 < етін,  —o«x«o, 


[- «nn dx = 2, 


and it follows that J exists. We have 
I- IN y 1 e? dy T 
= P(4)2'¢ 
= 2m. 
Thus [> p(x) dx = 1, as required. 


А nondegenerate distribution function F is said to be stable if, for two iid rv's Ху, X; 
with common df Р, апі! given constants 4,4, > 0, we can find a>0 and Blai, a) such 
that the rv vii d5 

X, = a7! (a,x, + а,Х, — В) 
again has the same distribution F. Examples are the Cauchy (see the corollary to Theorem 
21) and normal (discussed in Section 5. Зе) distributions. See also Section 6.6. 
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Let us write Y = gX + u, where g > 0. Then the pdf of Y is given by 
S TE 
Ф) = z*( ) 


с 


(60) е10702/202), 


КИШ E : © ^ 
Эу <y < о; а> 0), — со<ц<оо 
Definition 7. An rv X is said to have a normal distribution with parameters 
A (— © <и < oo) and c ( > 0) if its pdf is given by (60). 


If X is a normally distributed rv with parameters и and g, we will write 
X ~ (ш.а?) In this notation 9 defined by (59) is the pdf of an N(0, 1) rv. 
The df of an. /(0, 1) rv will be denoted by (x), where 


(61) Ф) = ‘i e? дц, 


1 
м 2x 
Clearly, if X ~ W(u, 0%), then Z = (X-p)/o ~ (0, 1). Z is called a 
standard normal rv. For the mgf of an N (us о?) rv, we have 


da i 2 2 
M(t) = Vus LK exp { x * T 2-0 - т} dx 


0 Lr Е киз SY a 


2,2 
(62) = exp( ш + oe 


for all real values of t. Moments of all order exist and may be computed 
from the mgf. Thus 


(63) EX = M'O)|1-9 = (и + i) MO| 1-0 = ш 
and 
ЕХ? = M'()|,-s = (M()0? + (и + o) M(D) ao 
(64) =o + р. 
Thus 
(65) var (X) =o. 


Clearly, the central moments of odd order are all zero. The central 
moments of even order are as follows: 


BUCA pyr = nee fE Ph eee ae ^ (n isa positive integer) 


EE EE улла 


орана of "i mean. o i 
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k: 


BAE ун 
a шз) 


(66) : = [Qn — 1) Qn — 3) --3-1]o. 
As for the absolute moment of order a, for a standard normal rv 2 we 


have 
Е|2| = 7 i WE e? dz 
0 
yer e dy 


T 

т 727 М 4 

| Tie + 1/2]277 
Vr Ў 


(67) 


As remarked earlier, the normal distribution is опе of the most important 
distributions in probability and statistics, and for this:reason the standard 
normal distribution is available in tabular form. Table 2 on page 651 
gives the probability P(Z > z} for various values of z( > 0) in the tail of 
an (0, 1) rv. In this book we will write z, for the value of Z that. satisfies 
a = P(Z > 2,), 0<а<1. 


Example 4. „Ву Chebychev's ерау if E|X|2-<s00, EX = p, and 
var (X) = o°, then 
Pix = aieo) < te f 
For K = 2, we get P(|X — u| > Ko} < 25, апа for К = 3, we haye 
P(|X | Keys +. If X is, in particular, (10 2); then 
P(|X — u| > Ko} = P(|Z| > K}, 
where Z is (0, 1). From Table 2 on page 651 
P(|Z| > dh. e alzi 2) = .046, and | > 3} = .002. 
Thus practically all the distribution isconcentrated within three standard 


Example 5. Let X ~ ЖӨ, 4). Then ir. А : 

"PAP ТУ 

р ЧЕ 2325 3 M al dum pi repost) 
E BRE slo Bs 5}. 


ex DAZ VI ал of sub e 
= .309 = .532. 


222 SOME SPECIAL DISTRIBUTIONS 


Theorem 24 (Feller [28], 175). Let Z be a standard normal rv. Then 


(68) Р{2 > х) = VEE e? as х oo. 
More precisely, for every x > 1/4/2- 

D i n. (khong aA: 
(69) xe a(t а). Р > а) e i 


Proof.' We have 
mo vie fnm o vien 3 


and 
1 mad (Ge decla y шт сел: 
И ы mago Je POM ge) ap р 
as can be checked on differentiation, Equation (69) follows immediately. 


Remark 4. The lower bound in (69) can be improved quite easily. Inaéed, 
from Problem 4.6.6, we have for x > 0 


(2) PZ > x) > 4 7 ef (угуз хуу 


Theorem 25, Let X, Xz, ‘--, X, be independent rv's with X, ~ (un 0,5), 
k = 1,2, n. Then S, = Dg- X, is an (Xia is у)” 02) rv. 


Corollary 1. If X, Xp, ---, X, are iid (р, a?) rv's, then S, is an (np, па?) 
rv and n ! S, is an (и, с?п) rv. 


Corollary 2. If Xy, X», ---, X, are iid (0, 1) rv's, then ns, is also an 
ЖО, 1) rv. 


We remark that if Xy, X», ·.., X, are iid гуз with EX = 0, ЕХ? = 1 such 
that n ' 7S, also has the same distribution for each n=1, 2, ++, that distribu 
tion can only be (0, 1). This characterization of the normal distribution 
will become clear when we study the central limit theorem in Chapter 6. 


Theorem 26. Let X and Y be independent rv’s. Then X + Yis normally 
distributed if and only if ¥ and Y are both normal. : ; 


If X and Y are independent normal гу'з, X+Y is normal by Theorem 25. 
The converse is due to Cramér [17] and will not be proved here. S 
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Theorem 27. Let X and Y be independent rv's with. common (0,1) distri- 
bution. Then X + Y and X — Y are independent. 


Proof. The proof is left as an exercise. 
The converse is due to Bernstein [6] and is stated here without proof. 


Theorem 28. If X and Y are independent tv’s with the same distribution, 
which has finite variance, and if Z, = X + Y and Zg = X — Y are inde- 
pendent, all rv's X, Y, 21 and Z, are normally distributed. 


The following result generalizes Theorem 27. 
Theorem 29. If Xy Xo «s X, аге independent . normal гу'з and ` 
EL, ab; var (Xj) = 0, then L; = Ef- aX; and L; = Xx bX, are indepen- 
dent. Here ау, a», ·*, a, and by; bo, «++, b, are fixed (non-zero) real numbers. , 


Proof. Let var (X;) = ау, and assume without loss of generality that EX; = 0, 
i = 1, 2; ---, n. For any real numbers a, B, and t 


Ee ^a t8l2" Eexp ez (aa; + Bbi) Xi} 
= Д ехр {z (аа; + ВЫ, a} 
shiek (E ўі IE ie] (since $ авай =0) 
eoe entra 
ш f penis . ft Eel ix: 

t ЙЕ 
= Eexp(ta X aix) - E ckp (tB L ЫХ) 
= Бел Ee, 
Thus we have shown that 


M(at, Bt) = M(at, 0) М(0,,84); for all. a, B, t. 
It follows that L; and 1 are independent. 


Corelary. If Xy, X, are independent (uy 05) and „(дь 07) чу, then 
X, — X; and X,+ Xs are independent. (This gives Theorem 27.) 


Darmois [22] and Skitovitch [119] provided the converse of Theorem 29, 
which we state without proof. 
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Theorem 30. If X, X, +-+, Х, аге independent rv's, ау, a2, ·--, а, by, bo, 
++, b, are real numbers none of which equals zero, and if the linear forms 


Ly = Ex, pda m DID? 
are independent, then all the rv's are normally distributed. 


Corollary, If X and „У are independent, rv's.such that Y + Y and Y — Y 
are independent, X, Y; X + Y and Y — Y are all normal. 


Yet another result of this type is the following theorem. 


Theorem 31. Let Xj, X;, ---, X, be iid rv's with finite variance. Then the 
common distribution is normal if and only if -7 


$,- h Хей cand’. Y, = ha tss 
= = 
are independent: 

In Chapter 7 we will prove the necessity part of this result, which is basic 
to the theory of t-tests in statistics (Chapter: 10).Seé also Example 4. 4. 8. 
The sufficiency part was proved by Lukacs [72], and we will not prove it here. 
Theorem 32.. X (0, 1)  X* ~ G(4, 2). 

Proof. See Example 2.5.7 for the proof. 
Corollary 1. If X ~ (и, 25, the rv Z? = (X — uo? is х1). 


Corollary 2. If Xi, Хо... X, are independent гуз and X, ~N (ln aD), 
k= 122, ---, n, then 277, (X, — 0218 yn). 


Theorem 33. Let X and Y be iid (0, o?) гу, Then X/Y is (1, 0). à 
Prooy. For the proof see Exatiple 4.4.9. 


m 
"^" follow that. 


SOME CONTINUOUS DISTRIBUTIONS 225 


PROBLEMS 5.3 


1. Prove Theorem 1. 
2.. Let X bean rv with pmf p, = P(X = k) given below. If F is the corresponding 
df, find the distribution of F(X), in the following cases. 


@ n - (Drap k= 0,1, 25 OS <I 
(b) p, = А01), К = 0, 1, 2,-5A»0. 
3. Let Y, ~ 00, 1), Y; ^ Ш[0, Yi] = Y, ~ U[0, Y,-1]. Show that 

Y, ~ X, Y, ~ XiX» с, Ү, ~ XX» X, 
where Ху, X» © X, are iid U[O, 1] rv's. If U is the number of Y, Y, г. y, 
in [t 1], where 0 < < 1, show that U has a Poisson distribution with parameter 
—log t. 
4. Let X, X, s X, be iid U[O, 1] rv's. Prove by induction or otherwise that 
5, = Xi X, has the pdf 


од = (n-DI2 E C D* (f) - rc - 7. 


5. N numbers are chosen independently at random, one from each of the N inter- 
vals [0, Lj], i = 1, 2, «+, N. If the distribution of each random number is uniform 
with respect to the length of the interval from. which it is chosen, find the expected 
value of the smallest of the N numbers chosen. (Klamkin [59]) 
6. Prove (a) Theorem 6 and its corollary, and (b) Theorem 10. 

7. Let X be a nonnegative rv of the continuous type, and let Y ~ U(0, X). Also, 
let Z = X — Y. Then the rv's Y and Z are independent if and only if X is G(2, 1/2) 
for some А > 0. (Lamperti [66]) 
8. Let Xand Y be independent rv's with common pdf f(x) = 8-* ax*-! if 0 < x < B, 
and = 0 otherwise; a > 1. Let U = min(X, Y)and V = max (X, Y). Find the joint 
pdf of U and V and the pdf of U + V. Show that U/V and V are independent. 


9. . Prove Theorem 15. 
10. Prove Theorem 8. 
11. Prove Theorems 22 and 23. 


12. Let X,, X :-, X, be independent rv's with X; ^ “(д A) i 21,2, n. 
Show that the rv X=1/7-1 Ху! is also a Cauchy rv with parameters p/(A? + у) 
апа 4/(4? + и?) where 


PRSIH mum, N 
Ai Wa n and де hee ae } 


18. Let Xj, Xp = X, be iid (1, 0) rv's and 4; # 0, bj, i = 1, 2, +++) п be any 
real numbers, Find the distribution of 277.1 1/(a;X; + b;). PB 


14. Suppose that the load of an airplane wing isa random variable X with W 
(1000, 14400) distribution. The maximum load that the wing can withstand is an 
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tv Y, which is #(1260, 2500). If X and Y are independent, find the probability 
that the load encountered by the wing is less than its critical load. 


15. Let X — (0, 1). Find the pdf of Z = 1/X7. If X and Y are iid (0, 1), 
deduce that U = XY//y? ү у? is WO, 1/4/2). 


16. In Problem 15 let X and Y be independent normal rv's with zero means. Show 

that U = XY/./(X? + Y^) is normal. If, in addition, var (X) = var (Y) show that 

V = (X? — ¥?)/,/(X?+ Y?) is also normal. Moreover, U and V are independent. 
XShepp [114]) 

17. Let X,, Х„ Ху, X, be independent 4 (0, 1). Show that Y = X,X, + X,X, has 

the pdf f(y) = е, — оо < y < oo, 

18. Let X ~ ~ (15, 16). Find (a) Р(Х x 12, (b P{10 < X < 17), 

(c) P(10 < X < 19|X < 17), (d) P(IX — 15| > .5). 

19. Let X ~ w(— 1, 9). Find x such that P(X » x) — .38. Also, find x such that 

P(X + 1 <x) =.4. 

20. Let X bean rv such that log (X — a) is (n, a°). Show that X has pdf 


— [logx — а) ~ ur } 
27°. 


—— 1 В 
ло- |90 e| m 
0 if x <a. 


If m,, m, are the first two moments of this distribution and a5 = /4/p,5/2 is the 
coefficient of skewness, show that а, |i, с are given by 


acm - Ym т, а? = log (1 + у2), 


and 

# = log(m, — a) — 30%, 
where 7 is the real root of the equation 7 +37 – аз = 0. 
21. Prove Theorem 27. 


22. Let X and Y be iid (0,1) гуз. Find the pdf of X/|Y|. Also, find the pdf of 
In. 


23. Construct a continuous analogue of Problem 5.2.17. 
24. Let Xy, X, ——., X, be iid M a?) туз. Find the distribution of 


ERX, пўк 
wal k=l 3: 


Y, x 
© k2)2 
LL 
25. Let Fy, Fa ---, Е, be n dfs. Show that min (AiG) Рх), +, Е,х,)) isan 
n-dimensional df with marginal df's Fy, Fp Fp (Kemp [55р 


26. Let X~ NB(1; p) and Y~ G(l, 1/2). Show that X and Y are related by the 
equation 
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PU < х) = РҮ) fo x»0 = 1ов (7 17) 
where [x] is the largest integer < x. Equivalently, show that 
P(Ye(n — 1, n] = Р,{Х = n), 
where 0 = 1 — e (Prochaska[91]) 
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In this section we introduce the bivariate and multivariate normal distribu- 
tions and investigate some of their important properties, 


Definition. A two-dimensional rv (X, Y) is said to have a bivariate normal 
distribution if the joint pdf is of the form 


1 
(1) IO = Fagor 
—00«x«o, = 0< y< oo, 


where с > 0, о; > 0, |p| < 1, and Q is the quadratic form 
AEN ET b MOT 
D one er) - (ms) 


We first show that (1) indeed defines a joint pdf. In fact, we prove the 
following result. 


Theorem 1. The function defined by (1) and (2) with о; > 0, 22 0, lel «1 
is a joint pdf. The marginal pdf's of X and Y are respectively, „К (uy о?) 
and JJ (uo; 02), and p is the correlation coefficient between X and Y. 


Proof. Let f(x) = E f(x, у) dy. Note that 
ay 2 (у= шз х= mY, рух шү 
q 70% у) = ( 02 7 f oj Ji ^M X 01 1) 


4 а B Au uM 
En e [uo + edes w du ext =). 
It follows аё. ' ^ 


j ~~ wh ре аро Вр 2 A a, 
On mu el Gr. dit asymp n eod 
where we have written. E : 


UAR it (2) - i 


Ө" ы Oe 


AK ; 
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The integrand is the pdf of an J//(8,, 0,(1—p°)) rv, so that 


fie) = he exp {— пеш олса 


LA e y) d) dx — fao eid i 


and f(x, y) is a joint pdf of two rv's of the continuous type. It also follows 
that f, is the marginal pdf of Х, so that X is (и, сү). In a similar 
manner we can show that Y is (us, 05). 

Furthermore, we have 


(5) fe y) _ 1 exp {= (у= B^, 


ЛО) оу р Ут 2001 — p) 


where [8, is given by (4). It is clear, then, that the conditional pdf /ух( y|x) 
given by (5) is also normal, with parameters 8; and exl =). We have 


(6) E{Y|x} = B. = m + ef (x — p). 


Since p(c2/o;) is the coefficient of regression of Y on X and oj, оз are the 
standard deviations, it follows immediately that p is the correlation coeffi- 
cient between X and Y. * 

In a similar manner, we can show that 


@) E{X|y}= by = + em =) 


and that the conditional pdf of X; given Y, is J^ (8,91 = p?)). The proof 
is now complete. 


"m 
Remark 1. If g?— 1, then (1) becomes meaningiess. But in that case we know 
(Theorem 4.8.1) that there exist constants a and»b such that P(Y = aX + b) 
= 1. We thus have a univariate distribution, which is called the bivariate 
degenerate (or singular) normal distribution. The bivariate degenerate normal 
distribution does not have a pdf but corresponds to/an rv (X, Y) whose 
marginal distributions are normal or degenerate and are such that (X, Y) 
falls on a fixed line with probability 1. It is for this reason that degenerate 
distributions are considered as normal distributions with variance 0, 


Next we compute.the теѓ M(t, t2) of a bivariate normal rv (X, Y) We 
have, if f(x, y) is the pdf given in (1) and f, is the marginal pdf of X, 


N 


=ч, 


| 
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Mts t) = f entm res ax dy, 
= uoto dan & ло) " 
3 bs e fio) (exp [3 030401 g^) + (ш + po Gr = шу} de 
= exp[o2/2(1 — p?) + tato — a) f. e etorta f c) dx. 
Now 
fz санах p) dx = exp] salt + гез) +4 ext & аў] 


Therefore 


2,2 2,2 
(8 ^ M(t, t) = exp (a + poti Nicole HORAM 


The following result is an immediate consequence of (8). 


Theorem 2. If(X, Y)has a bivariate normal distribution, X and Y are 
independent if and only if p = 0. 


Remark 2. It is quite possible for an rv (X, Y) to have a bivariate density 
such that the marginal densities of X and Y are normal and the correlation 
coefficient is 0, yet X and Y are not independent. Indeed, if the marginal 
densities of Y and Y are normal, it does not follow that the joint density 
of (X, Y) is a bivariate normal. Let 


9). f% Y= й + py exp| X (х2 — 2pxy + »»] 


c oram e» 

H Хх + 2рху + ‚ 
Жл pet Ge Dm 
Here f(x, y) isa joint pdf such that both marginal densities are normal, 
f(x, y) is not bivariate normal, and X and Y have zero correlation. But X 
and Y are not independent. We have T 


exp [ 


+ 


л) = 7E ey, ^ — e «x < m, 
4o) = T evn, ——o«y«o, 


EXY - 0. 
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| 
Example 1 (Rosenberg [103]). Let f and g be pdf's with corresponding df’s | 
F апа С. Also, let 


(10) Mex, у) = FOE) + a(2F(x) — 1) QG(y) — 1), 


where |a| < 1 is a constant. It was shown in Example 4.3.1 that A is a 
bivariate density function with given marginal densities f and ғ. 
In particular, take f and g to be the pdf of (0, 1), that is, 


(1) Ло) = 2) = Ley exec, 

and let (X, Y) have the joint pdf (x, y). We will show that X + Y is not 

normal except in the trivial case @ = 0, when X and Y are independent. 
Let Z = X + Y. Then 


EZ. — 0, var (Z) = var (X)+ var (Y) 4- 2 cov (X, Y). 


It is easy to show (Problem 2) that cov (X, Y) = alz, so that var (Z) = 
21 + (a/z)]. If Z is normal, its mgf must be 


(12) МА) = erba) 


Next we compute the mgf of Z directly from the joint pdf (10). We have 
мү) © ктү 


at af | AREO- Про) лолу) 
=e a aff. e*DF(x) = 1] f(x) a. 
Now 
Ü ARO- n9 dx =~ 2 f Le" — FOG) dx + eua 

= eta Ate A exp (= 402 +? — мә} du dx 

ш ША exp ( — Их? +o +x) – х] Viridis 

za ("exp C2 = 97/4} (9 —[x-(v- i 
an [orioneu 


ze eta f exp cju + 17/2); dy 
0 m 


(13 = eftt 200/2 p t 
: А POE ) 


dy 


where Z, is an (0, 1) rv. 
It follows that 
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м) = e? + afeta 2e Piz, > Tdi 
(14) = e1 + a(1 -2p[n > DC 


If Z were normally distributed, we must have Mz(r) = M(t) for all t and 
all [e| < 1, that is, 


(15) e ec = e + «(t TRI 2P{2, - ml 


For a = 0, the equality clearly holds. The expression within the brackets on 
the right side of (15) is bounded by 1 + a, whereas the expression е(&/ » is 
unbounded, so the equality cannot hold for all t and a. 


Next we investigate the multivariate normal distribution of dimension л, 
n 2 2. Let M be an n x n real, symmetric, and positive definite matrix. Let 
x denote the 1 x n row vector of real numbers (x1, xo, ·-' Xn), and let pe 
denote the row vector (р, f :'*» Hn) Where д; (i = 1, 2, ---, n) are real 
constants. 


Theorem 3. The nonnegative function 
(16) Хх) = с exp { - gon — o <x; < 0, 
i= 1,2, e,n, 


_ defines the joint pdf of some random vector X = (Xy, Xo, ++» Xn)s provided 
that the constant с is chosen appropriately. The mgf of X exists and is given 
by 


= 
ал) Ма toy s t) = exp (tar 05 
where t = (ty, f, +++) 1,), and tj, tz, «+, f, are arbitrary real numbers. 
Proof. Let D 
zc ODE ‚_ к и) М(х D fi dx 
(ESI S e. Г {ое (кчө MEC fr д, 


Changing the variables of integration to уу, yo, ---s Yn by writing x; — ше 7 
i = 1,2, +--+, n, and y = (ур Yo =s Ул), WE have x — и = y and 


(19) I= captu) |" f liy -N) Tio. 
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Since M is positive definite, it follows (see Section P. 3) that all the n 
characteristic roots of M, say m, mo, ---, Mm are positive. Moreover, since 
M is symmetric there exists (Theorem P. 3.4) an п x n orthogonal matrix L 
such that L'ML is a diagonal matrix with diagonal elements т, mo, ---, Mp 
Let us change the variables to 21, Zz =, z, by writing у’ = Lz’, where 
Z = (2, 25, `+", 2,), and note that the Jacobian of this orthogonal transfor- 
mation is |L]. Since L'E = I,, where I, is ann x n unit matrix, |L| = 1 and 
we have 


(20) І = c exp (tu) i. zu exp (ua: = =) Й d: 


If we write tL = u = (uy, uz; ---, и,), then tLz' = iau 1412 Also L'ML = 
diag (mi, m, ---, m,) so that zL'MLZ' = 57 , тт. Тһе mundi in (20) can 


therefore be written as 
Pil fiero (wee) de] = [IE e QE) 


It follows that 


„2 2 
(21 pu tu’) QD __ Узаг), 
) Ped (mma: m,) yi XP (2 2т; ) 
Setting f, = t2 = --- = t, = 0, we see from (18) and (21) that 
tats » иды ы Ол 
a f х Xos +, Xn) dxy dx; +++ ах, = "onmi cmt" 
By choosing 


\ 12 
о» c= оа ЖОП 
we see that fis a joint pdf of some random vector X, as asserted. 
Finally, since 
(L'ML) ' = diag (m; mj, m, 
we have 


2 
yy 44 -wLM-Dw = (MO. 
i my 


Also 
t (Mo = JEME] = Qnm m). 


It SNN from (21)and (22) that the mgf of X is given by (17), and we may 
writ 
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1 
(23) ё= an M jj. 


This completes the proof of Theorem 3. 


Let us write МГ! = ((c;j));, j=1,2, = » Then ; 
t 


МО, б, >, 0, ty 0, 0) = exp (tins 047) 
is the mgf of X, i = 1, 2, n. Thus each X, is (uj, 0) 1 = 1, 2, =, n. 
For i # j, we have for the mgf of X; and X; 
Моо, 0, +, 0, 1,0, 5,0, 6,0, +++, 0) = 


2 2 
ехр (ш tjr + Git; + 205i; + 1 ") 


This is the mgf of a bivariate normal distribution with means ду, и, variances 
ош оуу and covariance туу. Thus we see that 


(24) и = (дь Hasty Hn) 

is the mean vector of X = (Xj, 5 Х,), 

(25) ou = 0% = var(X), | i912, % 

and 

(26) Фу = 013010 EAJ i Pal 2 m 


The matrix М”! is called the dispersion (variance-covariance) matrix of the 
multivariate normal distribution. 


If оуу = 0 for i # j, the matrix M ! is a diagonal matrix, and it follows 
that the rv's Xy, Xo, +, X, are independent. Thus we have the following 
analogue of Theorem 2. 


Theorem 4. Тһе components Xy, X», +, X, ofa normally distributed ran- 
dom vector X are independent if and only if the covariances g;; = 0 for all 
i#jlij=1,2, snp 


The following result 1» stated without proof. The proof is similar to the 
two-variate case (Theorem 4.8.1) except that now we consider the quadratic 
form in n variables: (Уя, tX, — ш) > 0. (See Р. 2.4). 


Theorem 5. The probability that the гу Xy, Xz ---, X, with finite variances 
satisfy at least one linear relationship is 1 if and only if |M| = 0. 
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Accordingly, if |M| = 0 all the probability mass is concentrated on a 
hyperplane of dimension < л. 


Theorem 6. Let X = (X;, X», ---, X,) be an n-dimensional rv with a normal 
distribution. Let Y;, Y», ---, Yẹ} k < n, be linear functions of X;(j = 1, 2, 
++, n). Then (Үр Y, -+-, Y,) also has a multivariate normal distribution. 


Proof. ‘Without loss of generality let us assume that EX, = 0, i = 1, 2, ··:, n. 
Let 


(27) Y; a) АХ, P=1,2 =n k;k <n. 
Then EY, = 0, p = 1, 2, =, k, and 
(28) cov(Y, Y) =, Ў, Api 


where E(X; X) = dip i j = 1, 2, n. 
Тһе mgf of (Y; Yo, «++, Y) is given by 


MA(ty; ty «+5 ty) = Efexp (ty Š Духу + р Z AuX))- 
Writing и, = Di 1,45 j = 1, 2, +, n, we have 
МУ, ty s S) = Е{екр( ш, X2) 
= exp (+ 2 ort) by (17) 
үз: + (2 ‚2, б Р) 
= exp & Es АА, jti) 
(29) = exp {> a tit, cov (Y, Yn)}. 
When (17) and (29) are compared, the result follows. 
Corollary 1. Every marginal distribution of an n-dimensional normal distri- 
bution is univariate normal. Moreover, any linear function of Xj, X,---, X, 
is univariate normal. 
Corollary 2. -If X, Xp, ---, X, are iid (и, 0°), and A is an n x п ortho- 
gonal transformation matrix, the components Y;, Yo, ---, Y, of Y = XA’, 


where X = (Xj, ---, X,), are independent rv's, each normally distributed 
with the same variance 02. 
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We have from (27) and (28) - 
соу (Y, Y) = X Ap Agi + E Avda 
f {% if рт 4 
o if p=q 
since X77, 44, = 0 and X75, 45; = 1. It follows that 
© M*(fy ta s ta) = exp( + b: ба? ) 
and Corollary 2 follows. 


Theorem 7. Let X = (Xj, Xz =+, X,). Then X has an n-dimensional normal 
distribution if and only if every linear function of X 


Xt' = tX, + 15X bo tS, 


has a univariate normal distribution. 


Proof. Suppose that Xt' is normal for any t. Then the mgf of Xt' is given 
by 

(30) M(s) = exp (bs + 3 o*s)). 

Here b = E(Xt') = 275 tiy = ty’, where p = (д, +, Hn), and o? = var (Xt^) 
= var (t;X;) = tM-!t', where M-! is the dispersion matrix of X. Thus 


(31) M(s) = exp (tus + $tM‘t’s’), 
Let s = 1; then 
(32) М(1) = exp(ty’ + 4tM"'t’), 


and’ since the mgf is unique, it follows that X has a multivariate normal 
distribution. The converse follows from Corollary 1 to Theorem 6. 


Many characterization results for the multivariate normal distribution are 
now available: We refer the reader to Lukacs and Laha [74], page 79. 


PROBLEMS 5.4 


1. For a bivariate normal rv (X, Y) does the the conditional probability density 
function of (X, Y), given X + Y = t, exist? If so, find it. If not, why not?, 


2. In Example 1 show that cov (X, Y) = afz. 
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5. Let (X,Y) be a bivariate normal rv with parameters /4, д» 0,7, 02, and p. 
What is the distribution of X + Y? Compare your result with that of Example 1. 
4. Let (X,Y) bea bivariate normal rv with parameters /4, H» 01°, 02, and р, 
and let U = aX + b, a #0, and V — cY +d, c # 0. Find the joint distribution 
of (U,V). 

5. Let (X, Y) Ье a bivariate normal rv with parametérs д, = 5, и; = 8,0,? = 16, 
аг? = 9, and p = .6. Find P(5 < Y < 11|X = 2). 


6. Let X and Y be iointly normal with means 0. Also, let 

W = X cos0 + Y sin 0, Z = X cos 0 — Y sin б. 
Find 0 such that W and Z are independent. 
7. Let(X, Y) be a normal rv with parameters д, дь 0,2, 92%, and p. Find a 
necessary and sufficient condition for X + Y and X У to be independent. 
8. Let (X, Y) be a bivariate normal rv with parameters д, p л, 02% and p. 
Find the mean square error of the best predictor (in the sense of least squares) of 
Y given X. 
9. Show that every variance-covariance matrix is positive semidefinite and con- 
versely. If the variance-covariance matrix is not positive definite, then with prob- 
ability 1 the random vector X lies in some hyperplane cY^ = a with c s 0. 
10. Let (X, Y) be a bivariate normal rv with EX = EY = 0, var (X) = 


var(Y) = 1, and cov (X, Y) = р. Show that the rv Z = Y/X has a Cauchy 
distribution. E 


5.5 THE EXPONENTIAL FAMILY OF DISTRIBUTIONS 


Most of the distributions that we have so far encountered belong to a general 

family of distributions that we now study. Let Ө be an interval on the real 

line, and let {f}, 0 € Ө} bea family of pdf's (pmf's). We will assume that the 

set (f(x) > 0} is independent of 0.. Here and in what follows we write 
© Хх = (xy хо, +++, Xp) unless otherwise specified. 


Definition 1.: If there exist real-valued functions Q(0) апа D(0) on Ө and 
Borel-measurable functions 7(X;, X2, X; Jand S(Xi, X», >, Х,)оп 2, such 


that 
а) Лх Xo 7, Xn) = exp (o) T(x) + D(0) + S(x)), 
we say that the family ( f. fi € Ө} is a one-parameter exponential family. 


Let Y, Xo -- X, be iid with pmf (pdf) fj. Then the joint distribution 
of X £ (Xi, Xa —, X,) is given by 
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ada) = fi fled = fl exo (00) Td + 0) + S62) 
= exp {2(0) $; Tex) + mD() + 50), 
whete x (Xj Xo, хш); xj = Ggn хуз Xj) J — b 2,.«, m, and it 


follows that (gj: 0 € eyi is again a one-parameter exponential family. 
Example 1. Let X ~ (to 6 where ро is known and а? unknown. Then 
нед (x — аў 
E exp {- spy pam eu 
= exp{+ log (oy 2x) — ——— 3; — & = ай «= pa) 


fx) = 


is a one-parameter exponential family with 


Q(c^) = — et T(x) = (x — poy, 50) =0 ав 
D(a’) = = log (04/727). 


If X~ Wu 2), where ту is known but д is unknown, then 


exp t (x T } 


Л), = 


7 27 


1 xo их 
= = + d. 
ооу 2x e ( 20. Tire 202 ) 
is a one-parameter exponential family with 
QU) m e Ds E poo ТО) = 
% 205 


and 


x 2 

SQ) = —| +4 log (2700) 
ә [ 202 2] 

Example 2. Let X ~ P(A), A> 0 unknown. Then 


P,{X = x} =e AME = exp (— А + x logA — log (xD) 


and we see that the family of Poisson pmf's with parameter A is a one-para- 
meter exponential family. 
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Some other important examples of one-parameter exponential families are 
binomial, G(a, 8) (provided that one of а, B is fixed), B(a, 8) (provided that 
one of a, B is fixed), negative binomial, and geometric. The Cauchy family 
of densities and the uniform distribution on [0, 0] do not belong to this class. 


Theorem. Let (fj: 6€ Ө} be a one-parameter family of exponential pdf's 
(pmf's) given in (1). Then the family of. distributions. of T(X) is also a 
one-parameter exponential family of pdf's (pmf's), given by 


201) = exp (10(0) + D(0) + S*(1)} 
for suitable S*(r). 


Proof. The proof of Theorem 1 isa simple application of the transformation 
of variables technique studied in Section 4.4 and is left as an exercise, at 
least for the cases considered in Section 4.4. For the general case we refer 
to Lehmann [70], page 52. 


Let us now consider the k-parameter exponential family, k > 2, Let | 


Ө € 2, bea k-dimensional interval, and assume that {fo > 0} does not 
depend on 6 


Definition 2. If there exist real-valued functions 0}, Qz, ---, Q, D defined 
on 6, and | Borel-measurable functions Ту, Tz .—., Т,, S on 22, such that 
k 
@ 0 = exp (X О) TAx) + DO) + S), 
we say that the family {fẹ бє Ө} is a k-parameter exponential family, 


i Once again, if X = (Xj, X} ..., X,,) and X, are iid with common distribu- 
tion (2), the joint distributions of X form a k-parameter exponential family. 
An analogue of Theorem 1 also holds for the k-parameter exponential family. 
Example 3. The most important example of a k-parameter exponential 
family is Мр, c?) when both и and g? are unknown. We have 


0 = (и, o?), Ө = (0, 0): — ю < u < o, o? > 0) 
and ? 


= ap[- = tix- 1% + log (2205). 
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It follows that fẹ is a two-parameter exponential family with 


2 
016) =- 2 » 020) = E T(9-2x, TAx) =% 
D(8) = це 4 log 2r 2 са 01) к) 20. 


Other examples are the G(a, 8) and B(o, В) distributions when both a, В 
are unknown, and the multinomial distribution. U[a, 8] does not belong to 
this family, nor does ¢(a, 8). 

Some general properties of exponential families will be studied in Chapter 
8, and the importance of these families’ will then become evident. 


B 


PROBLEMS 5.5 


1. Show that the following families of distributions are one-parameter exponen- 
tial families: 
(a) X ~ b(n, p). 
(b) X ~ G(a, B), (i) if æ is known, (ii) if B is liio 
(c) X ~ B(a, В), (i) if a is known, (ii) if 8 is known. 
(d) X ~ NB(rip), where r is known, p unknown. 
2. Let X ~ (1, 0). Show that the family of distributions of X is not a one-par- 
ameter exponential family. 
3. Let X ~ U[0, 0), 0 € [0 oo). Show that the family of distributions of X is not 
an exponential family. 
4. Isthe family of pdf's 
ff) = det, ^ — oo < x < co; 0€ (— оо, оо), 


an exponential family? 

5. Show that the following families of distributions are two-parameter exponen- 
tial families. 

(а) X ~ G(a, B), both a and В unknown. 

(b) X ~ B(a, B), both a and В unknown. 

6. Show that the families of distributions Ufa, В] and ¢(a, В) do not belong to 
the exponential families. 

7. Show that the multinomial distributions form an exponential family, 


CHAPTER 6 


Limit Theorems 


6.1 INTRODUCTION 


In this chapter we investigate convergence properties of sequences of random 
variables. The three limit results proved here, namely, the two laws of large 

` numbers and the central limit theorem, are of considerable importance in the 
study of probability and statistics. Just as in analysis, we distinguish among 
several types of convergence. The various modes of convergence are intro- 
duced in Section 2. Sections 3 and 4 deal with the laws of large numbers, 
and the central limit theorem is proved in Section 6. 

The reader may find some parts of this chapter difficult, at least on the 
first reading. These have been identified with a dagger (t) and include the con- 
cept of almost sure convergence (Section 2), the strong law of large numbers 
(Section 4), and the proof of the central limit theorem (Theorem 6.6.1). An 
alternative proof of the central limit theorem is provided in Problem 6.6.4, 
which the reader may find simpler although the conditions are more restric- 
tive. Since the central limit result is basic and will be used repeatedly in the 
rest of the book, it is important for the reader to familiarize himself with 
this result and its application and to understand its significance. He can pick 
up the proof of Theorem 6.6.1 later if he so desires. Similarly, on the first 
reading it will suffice to know the strong law of large numbers and to under- 
stand its significance. 


6.2. MODES OF CONVERGENCE 


In this section we consider several modes of convergence and investigate 
their interrelationships. We begin with the weakest mode. 


Definition 1. Let (F,) be a sequence of distribution functions. If there 
exists a df F such that, as n — co, 
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(1) . Е, (х) > Е(х) ene 


at every point x at which F is continuous, we say that Е, conyerges in law 
(or, weakly,) to F, and we write Е, > Е. 

If (X,) is a sequence of tv’s and {F,} is the corresponding sequence of 
df's, we say that X, converges in distribution (or law) to X if there Ged) 
an rv X with df F such that F, ^ F. We write X, & Х. ~ 


It must be remembered that it is quite possible for a given sequence 
df's to converge to a function that is not a df. + 


Example 1. Consider the sequence of df's 


0, x«n, 
Bs Gg {ғ х>п. 


Here F,(x) is the df of the rv X, degenerate at x=n. We see that F,(x) 
converges to a function F that is identically equal to 0, and hence is not a df. 


Example 2. Let Х|, Xz, ---, X, be iid rv's with common density function 


1 
у= Оо жиб, (0<0< о). 
0 otherwise, i 
Let M, = max (X;, X, ··-, X,). Then the density function of M, is 
nx"! 
f C 0 0, 
fue T 
0 otherwise, 
and the df of M, is 
" f; x «0, 
F,(x) =< (x/0)" 0xx«, 
t. x > 6. 


We see that, as n — oo, 


EG Fa) ={P усу 


which is a df. Thus Е, F. 


The following example shows that convergence in distribution does not . . 
imply convergence of moments. 
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Example3. Let F, be a sequence of df's defined by 


0, x «0, 
Рх) = ъд 0<х<лп, 
1, nsx. 


Clearly Е, > F, where F is the df given by 
0, x<0, 
Fad {re lig 
Note that F, is the df of the rv X, with pmf 
id 1 “Су ig cl a 
BUY, = 0} el: РОХ. =n} = ot 
and F is the df of the rv X degenerate at 0. We have 
i EX т) = 
where k is a positive integer. Also EX* = 0. So that 
EX} » EX' for any k. 


We next give an example to show that weak convergence of distribution — 
functions does not imply the convergence of corresponding pmf’s or pdf's. 


Example 4. Let {X,} be a sequence of rv's with pmf 
m boten anaf ls iE oxi oap Ls 
Ыы Re ee Ды e otherwise. f 
Note that none of the f,’s assigns any probability to the point x = 2. It 
follows that 
SAX) f(x) аз n> о, 


where f(x) — 0 for all x. However, the sequence of df's {Fa} of rv's X, - 
converges to the function 


М 00 ух 2, 
RUEDA UE 
at all continuity points of F. Since F is the df of the rv degenerate at x — 2, 
[E { 
The following result is easy to prove. 


Theorem 1. Let X, be a sequence of integer-valued гуз. Also, let. 
Jak) = Р(Х, = к}, k = 0, 1, 2, --., be the pmf of X, n= 1,2, ., and 
ЛЮ) = P(X = k} bethe pmf of X. Then ' 
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fx) > f(x) „огай хх, Х: 


In the continuous case we state the following result of Scheffé [110] with- 
out proof. E 
Theorem 2. Let X,, n — 1, 2, ---, and X be continuous rv's such that 

Рх) > f(x) for (almost) all x as п > oo. 
Неге f, and f are the ріг of Y, and X, respectively. Then X, X. 


The following result is easy to estabiish. 


Theorem 3. Let {X,} be a sequence of rv's such that X, ^ X, and let c . 
be a constant. Then 


(a) Х, Фе Х +0, 
(b) cX, & cX, c#0. 


A slightly stronger concept of convergence is defined by convergence in 
probability. 


Definition 2." Let (х) be а sequence of rv’s defined on some probability 
space (0, 5, P). We say that the sequence {Х„} converges in probability 
to the rv X if, for every e > 0. 

(2) P(|X,- Х|> ғ} 0 as no. 

We write X, 5, X. 


Remark 1. We emphasize that the definition says nothing about the con- 
vergence of the rv’s X, to the rv X in the sense in which it is understood 
in real ‘analysis. Thus X,-P X does not imply that, given e > 0, we can 
find an N such that | X, — X | < e for п > N. Definition 2 speaks only of 
the convergence of the sequence of probabilities P ( | X, — X | > ғ} to0. 


Example 5. Let { Х„} be a sequence of rv's with pmf 
A фу I Yi hae тж 
P{X,=1} = +> P{X,=0}-= 1 T 
Then 
& Зір Дый os ж 
ноде 506-055 if 0<2<1, 
0 if є>1. 
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It fellows that (PX, |»5)-0 as n= оо, and we conclude that 
xX, 20. 


The truth of the following statements can easily be verified. 
1. X, 5 X« X, — X50. 
2: X,—^X, X, Yo P(X- Y) =1, 
for P{ |X- Y|»c) < P(|X, — х|> 53 *P(|X,- Y|» 
and it follows that P( | X — Y | > c) = 0 for every c > 0. 
3. X X-X,—- X,-50 аз nm—oo, 


for 


P(|X,- Х| > e) РОХ, |> 53 + Р{|Х„- x |» 5). 


Meu X, X, Y, Ys X, x Y, X x Y. 
bom X, = X, k constant, > k X, P kX. 
6. X, Fk > ХР, К?, 
xe X, a, Y, b, a, b constants = Х,У, 2. ab, 
for 
X,Y, = (+ Eg Ey Y,? P, (a+ кез bY Lob. 
8. X, 1-2 X7! £1, 


for 
PU M ze) m PC -> be) + P(g 1-а) 
io 9 { DD 
“+ PO< 51-0) 
and each of the three terms on the right goes to 0 as n — со. 
AX 
9.X, a, Y,- b, a; b constants, b #.0 > X, Y, ! +4 abv, 
10. Х, 2, X, and Yanrv= X,Y 2 Хү. 


Note that Y is an ry so that, given à > 0 there exists a k > 0 such that 
P(|x|» i к} < 6/2. Thus К 
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P{| Х,У — XY | > e}= Р(Х, - X|Y| > &|Y| » к} 
+ P(X,- X||Y| » &|Y| s к} 
<4 + P(x, - x|» £3. 


п. AU, DX, YU Y S9 Y, X Y, 


for 


(X, — X)(Y, — Ү)-2.0. 
The result now follows on multiplication, using result 10. It also follows that 
X, ^ X». 


Theorem 4. Let Х, ^. X, and g be a continuous function defined on 2. Then 
g(X,-^g(X) . as n о. 


Proof. Since X is an rv, we can, given e > 0, find a constant К = k (e) such 
that 


P(|X| > <->. 
Also, g is continuous on @, so that gis uniformly continuous on [— k, К]. 
It follows that there exists a д = д (e, К) such that 
lg) — #00) [< 
whenever |x| x k and |x, — x| « à. Let 
A - (|X| s 6, В -(|X, — Х| < à), ‘C= (|g(X;) – g(X)| < е}. 
Then we A П B=>weC, so that 
AN BEC. 
It follows that T 
, P(C*) x P{A} + P(B5, 
that is, 
^ P(|s(X,) - £(X)| = е) < P(X, — Х| = 2) + Р{|Х| > ky e 
for n > N(e, д, k), where №, д, К) is chosen so that 
P(|X, - Х| >ô} < F for n > N(s, à, k). 


Corollary. X,-P. c, where c is a constant => g(X,) = g(c), g being a 
continuous function. 


We remark that a more general result than Theorem 4 is true and state it 
without proof (see Rao [97], 104): X,+ X, and g continuous on 2 = 
£(X,) ^ g(X). 

The following two theorems explain the relationship between weak con- 
vergence and convergence in probability. 


Theorem 5. X, 2 X= X, 5 X. 


Proof. Let Е, and F, respectively, be the df's of X, and X. We have 


(o: X(w) € х) = (o: Хо) € x, X(w) € х) u (o: X,(o) > x, 
Ҳо) x} e {Х, < x} U {Xn > x Хх}. 
It follows that 


F(x’) < Ех) + P{X, > x, X < х). 
Since X, — XÆ 0, we have for x’ < x 
P{X, > x, X< х} < P{|\X,-X|>x-x}>0 as п- о. 
Therefore 


Е(х) < v F,(X), X Nc n 
Similarly, by interchanging X and Х„ and x and x', we get 
lim F,(x) < F(x"), хе 
Thus, for x’ < x < x", we have 
F(x’) < lim F,(x) < lim Ех) € F(x"). 


Since F has only a countable number of discontinuity points, we choose 
x to be a point of continuity of F, and letting x” | x and x’ t x, we have 


T F(x) = lim F,(x) 
at all points of continuity of F. 
Theorem 6. Let k be a constant. Then 
If. 
f X, 5k X, Pk. 
Proof. The proof is left as an exercise. 
Corollary. Let k be a constant. Then 
X, kem X, Èk. 
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Remark 2. We emphasize that we cannot improve the above result by 
replacing k by an rv; that is, X, 2. X in general, does not imply X, 5 X, 
for let Y; Xj, X2 +++ be identically distributed rv's, and let the joint distribu- 
tion of (X,, X) be as follows: 


Clearly, X, = X. But 
P(x, - X| > 3) 2 Р(х, - X| = 1) : 
2P(X,-0X-1)4P(X, = WX = 0) 
= 1» 0. 
Hence Y, 5 X, but X, ^ X. 


' 


Remark 3. Example 3 shows that X, 22, X. does not imply EX? + ЕХ" 
for any k > 0, К integral. - ET 


Definition 3. . Let {Х,) Бе a sequence of rv's such that E|X,|" < оо, for 
some r > 0. We say that X, converges in the rth mean to an rv X if 
E|X|' < co and Я 
(3) E|X, - X 50 аз n=; 
and we write X, > X. 
Example 6. Let (X,) be a sequence of rv's defined by 

p(x,-o-i--L,' (X, = 1) ae цоў. 
Then 

E|x,f = 10 аз n> ©, 
* and we see that X, -2» X, where the rv X is degenerate at 0. 


Theorem 7. Let Y, ^ X for some r > 0. Then X, ^ X. 


Proof. The proof is left as an exercise. 


* 
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Example 7. Let (X,) be a sequence of rv's defined by 
P(X/«0)e nz ob, open e p r0 ne 1,2, o 


Then E|X, |" = 1, so that X, + 0. We show that X, © 0. 


Р{Х, = iff e< 
Pix, > 9 - (21 п) if iij oasn = o. 


Theorem 8. Let (X,] be a sequence of rv's such that XY, ^. X. Then 
EX, EX, and EX? + ЕХ? as n > oo. 


Proof. We have 
|Ex,- X)| < Е|Х„— X| < E'"|x, - X*— 0 — asn о. 
To see that E X? + EX? (see also Theorem 9), we write 
EX; = E(X, - X) + EX’ + 2E(X(X, — X)} 
and note that 
[E{X(X, — 0) s VEX EX, -XY 
Le Cauchy-Schwarz inequality. The result follows on passing to the 
its. 


We get, in addition, that X, -# X implies var (X,) — var (X). 


Corollary. Let (X,), {¥,} be two sequences of rv's such that X, X, 
Y, 2. Y. Then E(X,Y,) > E(XY) as m, n > co. 


Proof. The proof is left to the reader. 


Asa simple consequence of Theorem 8 and its corollary we see that 
X, X, Үү together imply cov (X, Y,) > cov (X, Y). 


Theorem 9. If X, — X, then E|X,| + Е|Х|. 
Proof. Let 0 <r < 1. Then 

NEU. E|X,| = E|X, —X + Х| 
so that 

E|x,l — E|X[ < E|x, – хү. 
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Interchanging X, and X, we get» 
E|x[ - E|X,|' < E|X, - Х|. 
It follows that à 
|E|xr – E|x |] s 21%, — X| 0. as тоо. 
For r » 1, we use Minkowski's inequality and obtain 
[E|x, [T^ < te|x, — Х|!” + (EXIT 
and 
[E |x [1^ « te|x, — x [1^ + ELIT 
It follows that 
JEM" |x," Е s Ex, — X[- ав n- оо, 
This completes the proof. 
Theorem 10. Let r > s. Then X, ^ X= X, > X. 


Proof. From Theorem 3.4.3 it follows that fors«r , 
E|X, — Х|! < UE|X, — X|Y'—0 ^ as n- o 


' since X, — X. 


Remark 4. Clearly the converse to Theorem 10 cannot hold, since E|X| Г «o 
for s < r does not imply E|X|' < о. 


Remark 5. In view of Theorem 9, it follows that X, ^ X => E|Xx,| > 
E|X[| for s < т. 


Definition 4.1 Let (X,) be a sequence of rv's. We say that X, converges 
almost surely (a.s.) to an rv X if and only if ‘ 


(4) P{w:X,(o) > X(9 as n>oj=l, © 
and we write X, 25 X or X, > X with probability 1. 


The following result elucidates Definition 4. 


Theorem 11. X,*5 X if and only if lim P(sup|X, — Х| > £} = 0 for al 
e» 0. Ў 


{May be omitted on the first reading. 
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Proof. Since X,*5 X, X, — X *5 0, and it will be sufficient to-show the 
equivalence of 


(a) X,*50 апі (b) lim P {sup [Х„| > = 0. 
Let us suppose that (a) holds: Let € > 0, and write 
Ае) = {sup |е) and C= flim, = 0}. 
mean noo 
Also write B,(e) = C n A,(é), and note that В, (c) c B, (e), and the limit 
set (7. B,(e) = ф. It follows that 
lim PB,(c) = P 1 B,(e)} = 0. 
n= = 
Since PC = 1, PC* = 0, and we have 
РВ) = P(A, n C) 21— Р(С° у A‘) 
=1- PC’ — PA; + Р(С° n AS) 
= PA, + Р(С nA‘) 
= PA, 
It follows that (b) holds. 
Conversely, let lim, _.. PA,(c) = 0, and write 


De) = (lim |X,| > e» 0). 
Since е) € A,(e) for n=1, 2, ---, it follows that PD(c) = 0. Also, 
С" = (lim x, + 0} cÜ (fim |x,| > +), 
2 кї 


so that 
a 1 
s she PCs È PD =0, 
and (a) holds, 
Remark б. Thus X, *50 means that, for e > 0, 7 > 0 arbitrary, we can 
- find an m such that t ` 
(5) P{ d |X| > 8 < 7. 


Indeed, We can write, equivalently, that ; 
(6) Jim PLY (|| > ej] = о. 


Theorem 12. X,*5(Y- y Py 
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Proof. By Remark 6, X, *5 X implies that, for arbitrary e > 0, > 0, we 
can choose an по = m(é, 7) such that 


PL) x, - х| 12 1- 7. 

Clearly aer 
х. - x| sec(x - X| < e for n > ne 
It follows yw п> п \ cx 

ніх, - xisa > A - x sez 1-7 
that is dh 

P(X,-X|»s5 <y for nzn 

which is the same as saying X, ^ X. 


That the converse of Theorem 12 does not hold is shown in the following 
example. 


Example 8. For each positive integer n there exist integers m and k(uniquely 
determined) such that 

n= im Osm<2, KE 0,1,2. 
Thus, for n = 1, к= 0and т = 0; foras. к = 2and т = 1; and so 
on. Define гуз X,, forn = 1, 2: on Q —[0, 1] by, 


t 


2%; 2 <a тд? 1. 
Хо) = | 
, otherwise. 


Let the probability distribution of X, be givén by P(I) = length of the 
interval J © 0. Thus Ld 
P(X, = 2!) =F P(X, diac 
The limit lim,,... X,(w) does not exist. for any w € Q, so that X, does not 
сепуетар almost surely. But 
0 rib: 6.225, 
(x >} = Р(Х, > е) “fi uf 0 <е <2, 
Л WE 


and we see that m 
Р{|Х,| > г} —^O азл (and hence k) оо. 


Theorem 13. Let {X,} bea strictly decreasing sequence of positive гу'ѕ, and 
suppose that X, - 0. Then X, *5 0. 


Proof. The proof is left as an exercise, 
Example9. Let (X,) be a sequence of independent rv's defined by 
f : i 
P(X, = 0) 21-1, Ры Ye "£a. 
Then 
Ex, – Of = Ex. =. +0. as ло, 
so that X, 2.0, Also 
P{X, =0 foreverym<n< пу} 
ro 1 m-i 
i MAN RM 


which diverges to zero as по — со for al. values of т(ѕее P.2.9). Thus X, 
does not converge to 0 with probability 1. 


Example 10. Let (X.) be independent defined by 
Р(Х, -0) «1L. pcx, = n) anl, r>% nat 
Then | 
"o 1 
P{X,=0 form<n< nj = J] (1 — =): 
К n=m n А 
As m — co, the infinite product converges to some nonzero quantity (see 
P. 2.9), which itself converges-to 1 as m оо. Thus X, 5: 0. However, 
E|X,|' = 1 and Y, 7 0 as n — c. 
Example 11. Let (X,) be a sequence of гуз with P(X, = :-1/n] -4. Then 
E|X,|" = 1/n" 0 as n + оо, and X, ^ Q. For j < k, |X| > |X,|, so that 
{Х| > e) c > e). It follows that 


* Utd» 2 = dx >) 
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Choosing n > 1/e, we see that 
рі) xl > oil = РОХ > 2 < Рх] >=) =o 
and (6) implies that X, *5 0. 


Theorem 14. Let {X,, Y,}, = 1, 2, ·-:, be a sequence of rv's. Then 
|X, -Y,-^0 and Y, Y= xX, +Y. 


Proof. Let x be a point of continuity of the df of Y and є > 0. Then 
P{X, < x} = P{Y, <x + Y, — X, 
= PLY, sxtY,-X;Y,- Хе) 
+ P{Y,< x + Ys — Xn; Yn X, в) 
< P{Y, <x +e} + P(Y, — X,» e). 
It follows that 
lim P{X, < x} < lim P{Y, < x +}. 
noo mno 
Similarly 
lim P(X, < x} > lim P(Y, € x — e). 


Since ¢ >.0 is arbitrary and x is a continuity point of P(Y < x), we get 
the result by letting є > 0. 


Corollary. X, X= X, ^ X. 


Theorem 15- (Cramér[18], 254). Let (X, Y,), n 51; 2, +4, be a sequence 
of pairs of rv's, and let c be a constant. Then 


(a) peep Y, co X, Y, X te; 
(b) XLX, Abr eee 
Q X 5X Y hex if c4 
Proof. 


(a) X, X= Х, Др Отар Also, Y, — c = (Y, + X,) 
F(X, +040. 
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A simple use of Theorem 14 shows that 


i X,t Ye X c. 


(b) We first consider the case where c — 0. We have, for any fixed number 
k>0, 


Р(х, > 2} = P(X,v;| > 6 [Yul So} + PGY S е, |У, >52) 


s вд] » 9 + Рт > £3}. 
Since Y, P, 0 and X, — X, it follows that, for any fixed к > 0, 
lim P{|X,Y,| > e) < P(X| > к}. 

NIS 4 


Since k is arbitrary, we can make Р{[Х| >К} as small as we please by 
choosing k large. It follows that 


Now, let c з 0. Then 
X,Y, — cX, = ХҮ, — c) 
and, since X, ^ X, Y, ^c, X(Y, — c) -5 0. Using Theorem 14, we get the 
-result that 
X,Y, b eX. 


(c) Y, oe and c #0 => Ype 71 It follows that X, A X, Y, > c > 
X,Y, ' 1: c1 X, and the proof of the 853 is complete. 


As an. application of Theorem 15 we present the following example. 


Example 12. Let X; X2, -;-, be iid rv's with common law (0, 1). We shall 
determine the limiting distribution of the rv 
$ 
oru x, + 5 Rave M 
X +X +o + X? 
Let us write 
2 2 2 
U, = TAN X i X) and Ve EA lot. 


Then. 


U, 
W.--y- 
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For the mgf of U, we have 
м0 = ЇЇ Ее = Tle бал 
= oth ^ 
so that О, is an /(0, 1) variate (see also Corollary 2 to Theorem 5.3.25). 


It follows that О,  Z, where Z is ап (0, 1) rv. As for V,, we note that 
each Y is a chi-square variate with 1,d.f. Thus 


но rm) rem 
в. (a е y^ acd л, 


which is the mgf of a gamma variate with parameters a = n/2 and 
B = 2/n. Thus the density function of V, is given by 
1 1 /2-l „-пх/2 


fy) = |r (n2) Оту" 


0, otherwise. 


0< x< o, 


We will show that V, => 1. We have, for any e > 0, 
piv, 1р5 a e 02 2 (п) (2 o as п o. 
We have thus shown that t 
0.2 and Vn Al. 
It follows by Theorem 15 (c) that W, = U,|V, ^ 2, where Z is an 
(0, 1) rv. { 


For another example see Section 7.3. 


PROBLEMS 6.2 


1. Let Ху, Xp- Бе a sequence of rv’s with corresponding dfs given by 
F,(x) = 0 if x < -m = (х +п)2?лї—-п<х<п, and = 1 if x > n. Does F, 
converge їо а df? 

2. Let X, Xn- be iid (0, 1) rv's. Consider the sequence of rv’s (X,), where 
Ж„ = п Die, Xs Let F, be the df of X, n = 1,2, Find lim,- F(x), Is this 
limit a df? 

з, Let X, Xa- be iid 10, 0) гуз. Let N, = min (Xp Xo s X,), and consider 
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the sequence Y, = nN,. Does Y, converge in distribution to some rv Y? If so, 
find the df of rv Y. ‘ 

4. Let X, Xoe be iid rv's with common absolutely continuous df F. Let 
M, = max (X,, X» +», X,), and consider the sequence of rv's Y, = n[1 — F(M,). ' 
Find the limiting df of Y,. 

5. Let X,, Xa» be a sequence of iid rv's with common pdf f(x) = e-*** if x > 0, 
and = 0 if x < б. Write X, = n- Xi X;. 

(a) Show that Y,7. 1 + 6. 

(b) Show that min (Xi, X» ·--, X,)7+9. 

6. Let X, X,, --- be iid U[0, 0] rv's. Show that max (X; Xp, ©, X,) = 0. 


7. Let (Х,) be a sequence of rv's such that X, — X. Let a, be a sequence of | 
positive constants such that a, — co as n — со. Show that a7; X, = 0. 

8. Let (X,) be a sequence of rv's such that P([X,| < k} = 1 for all л and some 
constant k > 0. Suppose that X, 2) X. Show that X, — X for any r > 0. 


9. Let X,, X; +++, Xan be iid (0,1) rv's. Define 


X, X, Х, 
= (EL pI 4+... 229-1] = X, 2 m 
U, ipa * X ToU pm Js, V, — X? + X? +. + X, and 


2, «xs; 


V. 

Find the limiting distribution of Z; 

10. Let (X,] be a sequence of geometric rv's with parameter A/n, n > А > 0. 

Also, let 2, = X,/n. Show that Z, 26 G(1,1/A) as n > оо. (Prochaska [91]) 

11. Let X, be a sequence of rv's such that X, +5 0, and let c, be а зедиепсе of 

real numbers such that c, > 0 as n > co, Show that X, + c, *5 0. 

Does convergence almost surely imply convergence of moments? 

13. Let X, Xa» =- be a sequence of iid rv's with common df F, and write 

M, = max {Xp Xn =, X,), n = 1, 2, +». 

(a) For a > 0, іт, „хе P(X, > x) = b >*0. Find the limiting distribution of 
(блу-1/ * M,. Also, find the pdf corresponding to the limiting df and compute 
its moments. 

(b) If F satisfies ` 

lim e[l — F(x)] = b> 0, 
find the limiting df of M, — log (Бл) апа compute the corresponding pdf and 

`+ >the mgf. і 

(c) 1f X, is bounded above by x, with probability 1, and for some a > 0 

[T Jims =) 1. Ах) = Ь > 0, 
find the limiting distribution of (bn)/« (M, — xo), the corresponding pdf, and 
the moments of the limiting distribution. | 
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(The above remarkable result, due to Gnedenko [38], exhausts all limiting distribu- 
tions of М„ with suitable norming and centering.) 


14. Let (F,)be a sequence ot df's that converges weakly to.a df F which is con- 
tinuous everywhere. Show that F,(x) converges to F(x) uniformly. 


15. Prove Theorem 1. 

16. Prove Theorem 6. 

17. Prove Theorem 13. 

18. Prove the corollary to Theorem 8. 


19. Let V be the class of all random variables defined on a probability space 
with finite expectations, and for X € V define 


an [rg 


Show the following: 

(а) (X + Y) € X) + (Y); o(oX) < max (lol, 1) (X). 

(b) aX, Y) = p(X — Y) is a distance function on V (assuming that we identify rv's 
that are a.s. equal). 

(с) lim, dX, X) = 0 X, X. 


6.3 THE WEAK LAW OF LARGE NUMBERS 


Let (X,) be a sequence of rv's. Write 5, = Ei- Xo n= 1, 2, +. In this 
section we answer the following question in the affirmative: Do there exist 
sequences of constants A, and B, » 0, B, — oo as n — co, such that the 
sequence*of rv's B, 1 (S, — A,) converges in probability to 0 as m > со? 


Definition 1. Let {X,} bea sequence of rv's, and let S, = Dp. X, n = 1, 
2, ++. We say that {X,} obeys the weak law of large numbers (WLLN) with 
respect to the sequence of constants {B,}, B, > 0, B, 1 © there exists 
a sequence of real constants A, such that B,XS, — A,) > 0 asn— oo. 
А„ are called centering constants, and B, norming constants, 


Theorem 1. Let {X,} be а sequence of pairwise uncorrelated rv's with 
ЕХ, = ш and vat (X) = op i= 1, 2, -. If Zi, 0? + © as n oo, we 
can choose А, = Xi, /4 and B, = X, оу that is, 


ў Xi- hi у as n> о. 


ау р 


а 
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Proof. We have, by Chebychev's inequality, 


Rn А EŠ C 
2 
His. баред of} 5 eka одна 


Corollary 1,. If the X,,’s are identically distributed and pairwise uncorrelated 
with BA и and var (Xj) = а?,< co, we can choose A, = np: and 
B, = no’. 


` Corollary 2. In Theorem 1 we can choose B, = n, provided that з о? 
= 0asmn- oo. 


Corollary 3. In Corollary 1, we can take 4, = nu and B, — n, since 
no^|n! — Q as n — oo. 


Thus, if (X,) are pairwise-uncorrelated identically distributed rv's with 
finite variance, S,/n ^. p. 


Example 1. Let Xj Xo, - be iid rv's with common law om p) Then 
EX; = p, var (Xj) = р(1 — р), and we have 


5, Р 
п 


as n> о. 
Note that 5 „[п is the proportion of successes in n trials. 


Hereafter, we shall be! interested mainly in the case where B, = n. When 


gee that 9 obeys the WLLN, this is so with respect to the sequence 
п}. 
а 


Theorem 2. Let {X,} be any sequence of rv's. Write Y, = n~ “Л, А песез- 
LE SO reser te каше (E) viti emeak laot arga 
< numbers is that 


yi 
(1) TEET as n> о. 
e п 


Proof. For апу two positive numbers a, b, a > b > 0, we have 


@) (т = 2) Gy): 1: 
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Let A = {|Y,|>e}. Then we А = |У„| > & > 0. Using (2), we see that 
we A implies 


It follows that 


їз 

1+ ҮҮ, тте 
< ЕПИР 
ei +) 


+0 as n— oo. 


by Markov's inequality 


That is, ү, 0 as n- o. 
Conversely, we will show that for every e>0 
TT ^ 
2 виг) 
We will prove (3) for the case іп which Y, is of the continuous type. Tue 
discrete case being similar, we ask the reader to complete the proof. If Y, 


has pdf f,(y), then 
fae pime-(f+ J Tg me 


lyl>e Iylse 


< PAY, | зај: (1-5) hoy 


2 
P(|Y,|» )}+ ; Tg < P(|Y,|» à) +, 


1 


which is (3). 


Remark 1... Since condition (1) applies not to the individual variables but to 
their sum, Theorem 2 is of limited use. We note, however, that all the weak 
laws of large numbers obtained as corollaries to Theorem 1 follow easily 
from Theorem 2 (Problem 6). 


Example2. Let (X,, Xj, ..., X) be jointly normal with EX, = 0, EX? = 1foralli, 
and cov(x, X) = pif |j — il = 1, and = 0 otherwise. Then 
S, = Xii Xr is YO, 5} where 

д? = var(S,) =n + 2n — 1р, 


ктү э Tal Hi a 
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д La 
"of ur 2° ais 
Yin + An- Do] 2 
- Siz P+ yin + Xn— Dol. d 
ott ea ("2 2 эу Pt dy +0 as m-oo. 


It follows from Theorem 2 that и! 5, 2.0. We invite the reader to 
compate this result to that of Problem 6.5.6. 


Example 3. Let Ху, X5, -- be iid @(1, 0) rv's. We ‘have seen (corollary to 
Theorem 5.3.21) that m 1S, € (1, 0), so that n^ 15, does not converge 
in probability to Q. It follows that the WLLN does not hold. (See also 
Problem 10.) 


Let X, X»- be an arbitrary sequence of rv's, and let S, = У,у Хь 
n7 1, 2, ++. Let us truncate each X; at c» 0, that is, let 


‚ы; ы BIRKS y 

Write s:- Ё X5, and m, = E, ЕХ. 

Lemma 1. For any e>0, 

(4) р{|5„—т„|>е}<.Р{|5 -m |>} + È P(x» 2 


Proof. We bave 
P(|$,—m,|»s) = P{|S,—m,| >£ and |X,|<c for k= 1, 2, n} 
+ P{|S,—m,| >e and |X,|>c for at least опе k, 


k= 1, 2 n) 
SP {|SS -m| >E} + P{|X,|> c for atleast one k, 
1<k<n} 


sP(S; =m |>} + È РЦК). 
Corollary. If Xi, X», +++, X, are identically distributed, then 


(5) P(|S, = т„|>є}<Р{|8„ — m,|>e} + пР{|Х\|> с}. 
If, in addition, the rv's Ж, Xp ---, X, are independent, then 
6 nE(X<y 

Р{|5, — ml >e} < eae nP{|X,| > с}. 


Inequality (6) yields the following important theorem. 
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Theorem 3. Let {X,} be a sequence of iid rv’s with common finite mean 
и = ЕХ. Then 
п\5, >p as no. 


Proof. Let us take с=п in (6) and replace ғ by ne; then we have 
P(|S,- mne) < ur EQCY. + nP(|Xi| > п}. 


First note that Е || < œ = nP(|Xi| > п} >0 as n— оо. Now (see Lemma 
3.2.1) 
ЕОС) = 2{' xP {|X| > x) dx 


= xf) xP{|X,|>x} dx, 


where A is chosen sufficiently large that j 
xP (|Xi| > x} <4 for all x > 4, д> 0 arbitrary. 
Thus 
EQCY < ccaf dxs c nò; 
where c is a constant. It follows that 
L EQ < + 


and since à is arbitrary, (1/ne*) E(X xy can be made arbitrarily small for 
sufficiently large л. The proof is now completed by the simple observation 
that, since EX; = д, 


Dep as n> oo. 


We emphasize that in Theorem 3 we require only that E|X,| < ©; 
nothing is said about the variance. Theorem 3 is due to Khintchine. 
Example 4. Let Xj, Xo ++" be iid rv's with E|Xi|' < со for some positive 
integer k. Then 


X x 2, EX} as no. 
Thus, if EX? < co, then 3% X4)n ^ ЕХ?; and since (Z7, Хул)? 2. (ЕЗУ 
it follows that 

zx: Z XN 

51 Ef wro 
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Example 5. Let X, X; -~ be iid гу with common pdf 


1+6 x21 
seo) | T M » 6>0. 
0, xccl 


Then 
zx|- a «5f, ase 


21460 
= 3 < оо, 


and the law of large numbers holds, that is, 


nis,%, 144 as n oo. 


PROBLEMS 6.3 


1. Let X, X» --- be a seqence of iid rv's with common uniform distribution on 
?9, 1]. Also, let Z, = (II, X,)!/" be the geometric mean of X;, X; ©, Xm n=1, 
„ =. Show that Z, 5 с, where c is some constant. Find c. 


2. Let X,, X» --- be iid rv's with finite second moment. Let 
2 ES 
Yit soe) AU 
SK w that Y, ^ EX, 


A. Let X, X, --- be a sequence of iid гуз with EX; = и and var (X;) = 0°. Let 
aS = 5. Ху. Does the sequence (S,) obey the WLLN? Does the sequence 5, 
хуоеу the WLLN in the sense of Definition 1? If so, find the centering and the 
‚ horming constants. 


4. Let {X,} be a sequence of rv's for which var (X,) < C for all n and 

pij = cov (Xi; Xj) ^ 0 as |i — j| ^ co. Show that the WLLN holds. 

5. For the following sequences of independent rv's does the WLLN hold? 

fa) Р(Х, = +21) = +. 

(b Р(Х, = + Ю = 1/2,/k, Р(Х, = 0) = 1 – (1/0). 

w) P{X, = t2! = 1/241, Р(Х, = 0) = 1 — (1/29). 

(d P(X, = + 24) = 12^, Р(Х, = + 1) = 3 [1 — (1/29). 

(е) P{X,= + Vk) = + 

6. Let X,, Xz, -y be a sequence of independent rv's such that var (Ху) < co for 
k = 1, 2, .--, and (1/n?) Уу, var (Ху) + 0 as n — co. Prove the WLLN, using 
Theorem 2. 


%. Let X, be a sequence of ry’s with common finite variance 0°. Suppose that the 
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cotrelation coefficient’ between Ху апа Ху is <0 for all i # ў. Show that the 
WLLN holds for the sequence (X;] . a Языл 
8. Let (X,) be a sequence of rv's such that X, is independent of X; for j  k + 1° 
or j # k — 1. If var (Ху) < С for all k, where C is some constant, the WLLN 
holds for (X). р j “7 
9. 'For any sequence of rv's (X,) show that 

max | X,] 50 +7718, ^ 0. 

Isksn " 

Í її 

10. Let X,, Х„... be iid € (1, 0) rv's. Use Theorem 2 to show that the weak law 
of large numbers does not hold. That is, show that ў 


152 
E rS 


» Oasn — co, where S, = 2%» n=1,2,-. 
zi 


6.4 THE STRONG LAW OF LARGE NUMBERS} _ f 


In this section we obtain a stronger form of the law of large numbers dis- 
cussed in Section 6.3. Let Xj, X», =- be a sequence of rv's defined on some 


probability space (0, ^P). м уч, on 


Definition 1. . We say that the sequence {X,,} obeys the strong law of large 
numbers (SLLN) with respect to the norming constants {B,} if there exists 
a sequence of (centering) constants {A,} such that |i sa poe 


(1) By") (S,— Ay) 25.0) as п о. 


Here B, > 0 and B,— o» asm— 00. | qx 


We will obtain sufficient conditions for a sequence {Xn} to obey the SLLN. 
In what follows, we will be interested chiefly in the case B, = n. Indeed, 
when we speak of the SLLN we willassume ‘that we are speaking of the 


norming constants B, = л, unless specified otherwise. 
We start with the Borel-Cantelli lemma. Let {Aj} be any sequence of events. 


in 5. We recall that 


Q) fim A, = lim U 4 = N U 4r 
но п—=о к=п 


n-lk-n 


We will write A=lim,_.. Ал. Note that А is the event that infinitely many of 
the A, occur. We will sometimes write 


РА = P (lim. Ay) = P(A, i.0.), 


1This section may be omitted on the first reading, 
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where tti.o: Stands for "infinitely. often." In view of Theorem 6.2.11 and 
Remark 6.2.6 we have X, ^5 0 if and only if P{|X,| > e i:o} = 0 for all 
£20. 4 
Тһеогеш 1 (Borel-Cantelli Lemma). 

(a) Let {A,} be a sequence of events such that 977, PA, < oo. Then 


PA=0. 
(b) If {A,} is an independent sequence of events such that 2), PAs = 
“0, then PA = 1. 


Proof. £d 
(a) PA = im. Ü A) = lim P (04) < lim È PA, = 0. 
(b) We have 4 = Ur, Ni, А so that 
d РА ='P (lim Гүл) = tim P (55 4). 
For m > n, we see а! (0, A; c (9, Aj so that 
е п п 
уде < 1i ey ai ы 
© йек aote octo 


because (4,) is an independent sequence of events. Now. we.use the ele- 
mentary inequality 


т т BLA 
1-ep(- X aj) <1- Па -a)< Day >п, 1>ау>0, 
ЕЈ jon j-n 
to conclude that ^ 
$ siete y P( A At lim — . 
gom A $) 5 limi exp( »l PAs) 
Since the series Уу-у PA, diverges, it follows that PA‘ = Oor PA = 1. 


Corollary. Let {4,} be a sequence of independent events. Then PA is 
either 0 or 1. 


The corollary, follows since Xa PA, either converges or diverges. 


As a simple application of the Borel-Cantelli lemma, we obtain a version 
of the SLLN. ) 


Theorem 2, If X, -X;, --- are iid гуѕ with common mean n and finite 
fourth moment, then i 
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P dim 5, _ u= 
{lim n } L 
Proof. We have 


E{EG = wht = nEQG t +6(} ots er. 


By Markov's inequality 
E(D A 0) . Cw С 


= rod 


Шай» м) e О. атои = С 


Therefore 
È Р|5, = pn| > ne} < co, 


and it follows by the Borel-Cantelli Jemma that with probability 1 only 
finitely many of the events (o: \(S,/n) = p| > e} occur, that is, PA, = 0, 
where 


A 


A, = lim sup {| Se — p >}. 
Эхо п 


The sets A, increase, as є = 0, to the w set on which S,/n +» и. Letting 
е > 0 through a countable set of values, we have 


P{Ss p 0) = PU Aa} = 0. 


Corollary. If Ху, X, «+; are iid rv's such that рај < К} = 1 for all n, 
where K is a positive constant, then n ! S, ~* д. 


Theorem 3. . Let Xj, Xz, :::Бе a sequence of independent гу'з. Then 
хуч 0 X P([X,|» e) «c fór alle > 0. 
n=l 


Proof. Writing A, = {|Xq| > в}, we see that (4,) is a sequence of indepen- 
dent events. Since X, *5 0, X, > 0 on a set E° with PE = 0. A point w e E* 
belongs only to a finite number of А„. It follows that 


lim sup A, c E, 
n 
hence P(A, i.o.) = 0. By the Borel-Cantelli lemma [Theorem 1 (b)] we must 


have Dr, РА, < о. [Otherwise, D>, PA, = со, and then P(A, i.o.) = 1.] 
In the other direction, let 
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2 1 
Ai, = tim sup { |Х„| > EY 


and use the argument in the proof of Theorem 2. 


We recall here (see remarks following Definition 4.3.5) that two rv's X \ 


and Y are said to be equivalent if P(X # Y} = 0. 


if they differ almost surely only by a finite number of terms; that is, if for 
(almost) all we Q there exists an n(w) such that, for n > n(w), the sequences 
X,(w) and Х,(о) are the same. In symbols, we write P(X, # X, i.o.) = 0. 


Definition 2. Two sequences of туз ¥ „апа Ху, are said to be tht er 


Definition 3. It two sequences of rv's, X, and X7, converge on the same 
event except for a subset with zero probability, we say that they are conver- 
gence-equivalent. 


Let 
S-Ebx wd S-Px. 
k-l 


Lemma 1 (Equivalence Lemma). If the series 317 , Р(Х, # X;} < co, the 
sequences X, and X; аге tail-equivalent and S, and S', are convergence 
equivalent. The sequences B,'.S, and В! Sh B, t со, converge on the 
same event and to the same limit, except for a null event. 


Proof. By the Borel-Cantelli lemma, P{lim,_.. [X, # X;]) = 0, that is, 
Р(Х, # X,i. 0.) = 0, so that P(o : X,(w) 4 Хо) for only finitely many 
n) = 1. It follows that, if S,(w)/B, + 0 as n + со for w such that Хо) # 
X,(w) only finitely many times, then S;(w)/B, + 0 as'n + co, and conversely. 


Remark 1. Let {X,} be any sequence of гу'з. We can always truncate Y, 
suitably to get a sequence of bounded rv's that differ from X, only on an 
event of arbitrarily small probability. Let X; be X, truncated at c, where 
c, is chosen so that P(|X,| > c,) < e/2". Then _ 


P(Ü [X, # xj < B PIX, # X3 = х P(X,| 5%) 2 
Ву Lemma 1 we see that Х, and X, are tail-equivalent. 


Asa typical application of the equivalence lemma we prove the following 
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Theorem 4 (Jamison, Orey, and Pruitt [52]. Let (X,) be a sequence of iid 
rv's, and {w,} be a sequence of positive numbers. Write 5, = Уру Wa X, 
and W, = Xt- We: If the sequence {W,} diverges such that w,/W, — 0 as 
пә oo, and if 


(3) TP{|X|>T}+0 as Т- c, 
and if 
(4) limp- EXT exists and equals и, where X7 is the rv X, truncated at T, 
then S,/W, 2 [i 
Proof. Let 
W, 
X» [xls wid 


Xu = 
0, [Xl > Ba 


and write Sy, = Drai WiXa Note that тах, (w,/W,) — 0 as n — co 
if. and only if Dy, w, = co and w,/W, > 0. It follows that W,/w, — oo 
for each k < n, and 


P(S, 9, s E, Р{Х„ ж X] 
W, 
- É r (ixi > ml. 
By (3) we have 
W, W, 
P(xi > LA <e 
for sufficiently large n. It follows that for sufficiently large т 
W, 
P(S, # $) < » pre 
It therefore suffices to prove the result for Spn instead of S,. We have 
ЕЗУ = p E EX 


я n kml 
by condition (4). Also, 
© (вн) = » wh var Xu) < » Eum 


But 
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T (8010) = 3 C тра] > т) +2{ x Pt» х}®} 
20 as T o. 
Thus 


p EX 0 as n> oo, 
and for sufficiently large n we get from (5) 
УЛА Tow Wt 
var (Fe) < у= А WI EE =e. 
Therefore it follows that 
| Sin — Е, | уаг(5,„/И/,) < E 
P{ W, > 2) 5 < 


Li 


for à > 0. Letting € > 0, we see that 


p {Se 5 E ао as п—› оо forevery à > 0, 


and the proof is complete. 


Remark 2. Note that condition (4) is | weaker than E|X| < co. The converse 
result also holds; that is, if S,/W, 2 4 then (3) and (4) hold for all diver- 
gent sequences {w,} such that w,/W, > 0. For details we refer to Jamison, 


Orey, and Pruitt [52], page 41. 


We next prove some important lemmas that we will need subsequently. 


Lemma 2 (Kolmogorov's Inequality). Tet Xis Xo ---, X, be independent rv's 
with common mean 0 and variances on К = 1, 2, «--, n, respectively. Then 


for any є > 0 


(6) P{max [S| >} < È S 
ё? 


Proof. Let Ay = 0, 
j = {max |5] <e} k=l, 2-40 
and 
B, = A; П A, 


= {|Si| 6 |5,1 < e) n {at least one of Is] --, |S] is > в} 


= (ISI < + [Sizi] < &|S| > е). 
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It follows that 
4 = x В, 
ki 
and 
B, c {|Si-a|  &|S| > е). 
As usual, let us write I5, for the indicator function of the event B,. Then 
E(S,IgY = E((S,— 5)15, + SM). 
= E((S, — Sy). Ip, + Sig, + 2505, — Sy)I)- 


Since $, — 5, = Хн + -- + Xn апа S,Is, are independent, and EX, — 0 
for all k, it follows that 


E(S,Ig. = E((S, — S) In) + (515) 
> E(S,Ig) > & PB,- 


The last inequality follows from the fact that, in By, |S,| > г. Moreover, 


Ë E(SJs f = E(Si14) < ESD = É o 
k=1 " 1 


Ў dz eL PB, = è P(A‘), 
as asserted. 


Corollary. Take n = 1; then 
2 
0, 
Р{|Х\| > ғ} < w 


which is Chebychev’s inequality. 
Lemma 3 (Kronecker Lemma). If De, x, converges to s (finite) and b, 1 co, 
then À 
b) È x0. 
k=l 


. Proof. Writing By = 0, ap = be — %- and $44 = Ef- Xe we have 
1.4 У ES 
і 5, à 5, p Ыы — зю) 
= E Фены + b bya 30) — E È bis, 
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Nu 


Seat p E O- bids 
= ы — zá ag) 


It therefore suffices to show that b Epa GaSe > 5. Since s, s, there 
exists an mp = mo(e) such that 


ls — s| <> for n > Mo. 
Since b, 1 со, let m be an integer > ту such that 
Б (756-9 «$ for n>m 
Writing 
n= b, i: (b, — by-1) Sip 
= 
we see that 
р IE Ge 626 - 0E 
and, choosing n > лу, we have 
w 4 1 я 
Д 6 heo = + GLB 5-01 << 
This completes the proof. 
We need the following criterion for almost sure convergence. 
Theorem 5 (Cauchy Criterion. Х, “> X if and only if 
lim P(sup Xs» -X|se-1.. forevery ¢>0. 

Proof. Let X, “> X. We have for 6 > 0 

{Kem Х| 33.0 (o —x| s 23) (c. — Х| s д}, 
so that 

Аар х 2) (х 215 п (5, X15 2) > 

E JO 2 atm 2 п 2 

= {Хы vi x, < д) 
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for all т. It follows that 


ia) (x; Я Х| < Lor Й, (Xren = Xal < ò). 


Й {nen = Xo] $3} = {Х„- Х| <ð forall m} 
pe 
= {sup | х= Xx б}, 
so that 
P(sup|X,., — Kal «oy РЙ Х| < 21). 
m =п 
Letting n — oo, we see that 
lim P(sup |Xsam — X, < 0} > lim РОЙ (ХА Х| < $y ry 
п-9% т по уең 
as asserted (Remark 6.2.6). 
Conversely, let 
lim P(sup|X,,, — X, e) =1  forevery e» 0. 
For any integer k > n, we have 
© 8 
А 1 =н. xs nds - = 2 
m= f 
c (Xi Ф х xà), 
so that S 
() Xe» -X|s 2 € (IX, - X s д} 
holds for all k > п and all т. It follows that 
А. - 01 50) = Й - 015 д) 
m=1 m= 
for all к >п, so that 


Bes = ns 2-8 


en -= X| < ò} 


and 


lim PE Als — < $y" 
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s lim Й Й х. - 0 < д] 


=н Й 


A N [|Xe+m hs) РА < 0)) 
(since (0, (5, П — Ху| < 6] is a nondecreasing sequence of sets). 
Since . 


m-l 


lim P(sup|X,,, — X[ < є} = 1 for every e > 0, 
n= т 
we see that 


GÀ Й (05 йр=1 
n=l km m= 
for any fixed à > 0. Let б, > 0, 2, | 0 as poo, and consider the set 
ДЕЙ POR Bid tx Lx) <a 

O00 00 tim = Х| < 0, 
Now for any sequence A;, by Boole's inequality 

PA Az 1- Ў РА. 

(42 1-Ere 
Thus : 
РА>1- PLI ^ xs — |< ә '] 

FAY D Bree = XI s гш) 
and it follows that PA — 1. 

The sequence {X,} satisfies the Cauchy criterion for every point w € A. 
Therefore there exists a function X* such that limp- X,(w) = X*(w) for 
all we A. We define 

X(o) = js Ho) wed, 


wed’. 


Since РА? = 0, we see that X,** X. This completes the proof of the 
theorem. 


ot 6. If X- var X, < оо, then Xa (X, — EX,) converges almost 
surely. 


Proof. By Kolmogorov's inequality 


P(maxS,., — 5, — Е(5„, — Se) < 4 Evans. 
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Letting т, n — со, we see that 
lim P{sup [Smir — E Smit — (8, — ES, > e) = 0, 


and by the Cauchy criterion the sequence (S, — ES,) converges with 
probability 1, as asserted. ie 


Corollary 1. Let ( X,) be independent rv's. If 


Д0 со, В, t о, 


then 


5, — ES, ss 
CN EDS 0. Ў 
The corollary follows from Theorem 6 and ће Kronecker lemma. 


Corollary 2. Every sequence {X,} of independent rv’s with uniformly 
bounded variances obeys the SLLN. , А 


If var (X,) < А for all К, and В, = k, then 


and it follows that y hit 
5, — ES, А0) 


Corollary 3 (Borel’s Strong Law of Large Numbers). For а sequence of 
Bernoulli trials with (constant) probability p of success,-the SLLN holds 
(with B, = n, and A, = np). ‘ 

Since EX, = рў war (X,) —p(1 -p) s à 0<p<'l, 


the result follows froni Corollary 2. 


Lemma 4.. 4 
@) P P(X[ =n} < Е|Х| < 1 + Š rix] >n}. 


Proof. Let X be of the continuous type with pdf f Then 
z| x|- f^ bird = E f bifeo dx 


innab упин: { 


вана 


"and it follows that 


РРО 


x Now 
; Bere sini <eon E Ere exem 
ge 
| id 


Berne sir ken Berta urere ned : 


x 


=È Р(х >п} +1. 


The proof for the case in which X is of the discrete type is similar, See 
also Lemma 3.2.2. 1 


Theorem 7 (Kolmogorov's Strong Law of Large Neen. Let E Xy n 
M iid rv's with common law A(X). Then 


nS, =S p finite . 
_ if and only if Е|Х| < co, and then д = EX. 


Proof, Ie -15,*5 p finite, then ss 
ҮЗМӨ MAAR. e деде Doy ii dedos ү миг 
бул жа, 


_ By the, Borel-Cantelli lemma, we must have 1, Р{|Х„ | >п}'<'о, 
which by Lemma 4 implies E| X| < co. It. follows, for rare, by the 
WLLN, that д = EX. 


‘Conversely, let xn | < оо. We define a sequence of truncated vs X,a5 : 
ров; & А у сти 


Бы (Ж иј. 
pn Ф otherwise. 3 


m | Writing 5, = TEL =, Xp we have 
Be Arana- Brie e Betti 
|. sinee ; ghe a "в are identically шшш. e since. d 
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E. P(|X| > k} < co, and it follows by Lemma 1 that л 15, and ns, 
converge to the same limit and on the same Set, except for a null set. 
Therefore it is sufficient to show that n WEE *5 EX. We have 


var (X,) < EX'S = f Åf) dx 
- ў x f(x) dx 


=0 
* KS ixi el 


s e e D Ps |х| «ke 1), 


where fis the pdf of X. A similar argument holds for the discrete case. Thus 
in either case 


Ў моо i БЫУ P{k < |X| < k +1} 


n=l n т=1 k=0 
(k + 1) 1 
-EE CP S |X| < kt } 


*P(0s|Y|«1) X 


n=l 


i 
т 


Lay Я yd 
5+1) Pik |х| EKE D 


* P(os|x| «0 5. 


Then a 
$ 1 1m: 2 


р ренке” pS vy 
and it follows that 
LE лауа EHD pu |] en 1) 
| +.2Р{0 < |X| <1} < co 
by Lemma 4. adt follows from Corollary 1 of Theorem 6 that * 
W'S, — WES, **0 аз no co 


‘Since ЕХ, > EX as п = co; it follows by Lemma 3 that n ES, > EX. 


We can therefore replace 7 n'ES, by EX and сме that 
4 m$, as, Mh A 
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PROBLEMS 6.4 


1. For the following sequences of independent rv's does the SLLN hold? 


(а) Р(Х, = +2) = +. 
(b Р(Х, = + К) = 246, Р(Х, = 0) =1 – (1/0). 
(с) Р(Х, = + 20) = 1/22, P(X, = 0) = 1 — (1/2%). 


2. Let X, Xp --- be a sequence of independent rv's with Dg var (X;)/k? < оо. 
Show that 
dp E var Qn) +0 asn > со. 
k=l 
Does the converse also hold? 
3. For what values of œ does the SLLN hold for the sequence 
P(X, = xk) ={?, 


4. Let (0,2) be a sequence of real numbers such that De, g/k? = оо. Show that 
there exists a sequence of independent rv's (X,) with var (X) = ox, k=], 
2, «+, such that n7! E3 (X, — ЕХ) does hot converge to 0 almost surely. 


[Hint: Let P(X, = + К) = oy[2K*, Р(Х, = 0) = 1 — (@,2/k*) if olk € 1, and 
P{X, = + оу} = + if o/k > 1. Apply the Borel-Cantelli lemma to {|Х„| > п).] 


5. Let X, be a sequence of iid rv's with E|X,| = + оо. Show that, for every 
positive number A, P{|X,| > nA io.) =)1 and P{|S,| > nA io.) = 1. 


/.6. Construct an example to show that the converse of Theorem 1(а) does not 
hold. 


6.5 LIMITING MOMENT GENERATING FUNCTIONS 


Let Xi, X» --» be a sequence of гу”. Let F, be the df of X,, = 1, 2, ·--, and 
suppose that the mgf M,(t) of Е, exists. What happens to M,{t) as n — оо? 
Does it always converge to an mgf ? 
Example 1. Let {X,} be a sequence of rv's with pmf P(X, = — n) 2.1, 
п = 1, 2, ++. We have ? 
МД) = Е#Ха = е" +0 ап > co for allt >0, 

> М) = + co» forall г< Opand'M,(t) ^1 at t= 0. 

Thus 


0, t>0.% 2 
момо [1 120 asm со. 


i 0, “t<0 
But M(t) is not an mgf. Note that if F, is the df of X, then 
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ON o ifx«-—n 
ко) [1 ifx>—n 


and F is not a df. 


ә F(x) = 1 for all x, 


Next suppose that X, has mgf M, and X, , X, where X is an rv with 
теѓ M. Does M,(t) + M(t) as n — co? The answer to this question is in 
the negative. 


Example 2 (Curtiss [20]). | Consider the df 


0, ; x< =n, 
F,(x) = n +c, ап"! (nx)  -—n<x<n, 
ji х>п, 
where c, = 1/[2 tan”! (n?)]. Clearly, as n > со, 
0, <0, 
F()s Fe (T. ху 


at all points of continuity of the df F. The mgf associated with F, is 


Mt) = Ё Cpe” tia dx, ; | 


which exists for all t. The mgf corresponding to F is M(t) = 1 for all t. 
But M,(t) + M(t), since M,(t) > co if з 0. Indeed ' 
\ 33 
моа EE ni 
ene E ruine а 
The following result is a weaker version of the continuity theorem due to. 


Lévy and Cramér. We refer the reader to Lukacs [75], page 47, or Curtiss 
[20], for details of the proof. f 


Theorem 1 (Continuity Theorem). Let {F,} be a sequence of df's with 
corresponding mgf's {M,}, and suppose that M,(t) exists for |t] < t for 
every n. If there exists a df F with corresponding mgf M which exists for 
|r| € & < to- such that M,(r) ^ M(t) as n со for every t e[— ts hh 
then Е, 5 F. » 


Example 3. Let X, be an rv with pmf К n 
; А $ 


P(X, = ied, APE es oen — 


Then Му) = (1/n) e + [1 — (п) exists for all ге 2, and M,(t) > 1 as 
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n- co for all t. Here M(t) = 1 is the mgf of an rv X degenerate at 0. 
Thus X, 2 X. 


The following lemma is quite useful in applications of Theorem 1. 
Lemma 1. . Let us write f(x) = o(x), if f(x)/x > 0 as x —^.0. We have 


lim t * + + “(ү = е for every real a. 


n= 


Proof. By Taylor's expansion (Theorem P.2.7) we have 
F(x) = f0) + x (0х) 
= f(0) + х/'(0) + (f'(0x) — /'(0)}х, 0<0<1. 
Nom is continuous at x = 0, then as x > 0 
ЈО) = /(0) + xf'(0) + o(x). 


Taking f(x) = log(1 + x), we have f'(x) = (1--x) !, which is continuous 
at x — 0, so that 


log (1 + x) = x + o(x). 
.. Then for sufficiently large n 


nogh 24 (Dye (Des [n (0) 
“af no) 


=a + ol). 


fı i а + (X = ete), 
as asserted. 


Example 4. Let Xj, X, --- be iid A1, p) tv's. Also, let S, = У" % and let 
TU be the теѓ of Sy Then 


М0 = @ -*péy for all 1, 


where q = 1 — p. If we let n — oo in such a way that np remains constant 
at А, say, then, by Lemma 1, 


_ It follows that 


м0 =(1-444e) = {+4 3 uc eap Sie Gie 10 А 


- for all t,. 
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at 
which is ‘he mgf of a-P) n IV. Thus the binomial distribution function 
approaches the Poisson’ df, provided that n — со in such a way that’ а = 
д> 0. 
Example 5. Let X ~ P(A). The mgf of X Š given by 
M(t) = exp(X(e' — D} (огай. 
Let Y = (X — 2)/4/4. Then the mgf of Yis given by 
4, (0) = ei M(— \. 
My) = e MC) 
"i Also,” 1 j 
log м) = AUT ¥ tog (52) vil 
т ЯМА + Meth — л 5 
E otn arto ) posta 


T ti ba re 


It fol ows, that y 


De “as À — со, 


so that Mt) > 202 ауд. со, which ds the mgf of an.#(0, 1) ry. 


For more examples see Section 6.6. 


PROBLEMS 65. ET ; TE 
ees p). Show that 
2pX +, as p 0, 
where y: ^ s 09. ЖҮ д 
2: Let X, ^^ ^NB(r,;l— Pa) n=]; Mie -- Show that X, teen cere in such 
a way that r,p, — 2, where X ~ PQ). 
Y 8 Зе кера IV'S with dcl given w P = =+ ўч Spo 


e dj where B > 0 is a constant 
utionof X,/n. — 
he limiting distribution of X, /n*. 
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6. Тех, Xs, 7, X, be jointly normal with EX; = О, EX?=1 forall iand 
соу (X, Xj) = p, i, j, = 1, 2, (i # j). What is the limiting distribution of 


n=? Sq, where S, =È x? 


6.6 THE CENTRAL LIMIT THEOREM 


Let Xi, Xz, «+ be a sequence of rv's, and let 5, = Di- Xo n = 1, 2, +. 
In Sections 3 and 4 we investigated the convergence of the sequence of rv's 

B, (S,— A,) to the degenerate rv. In this section we examine the conver- 
gence of B, (S, — A,) to a nondegenerate rv. Suppose that, for a suitable 
choice of constants A, and B, > 0, the гуз B, (S, — 4,) ^. Y. What are 
the properties of this limit rv Y? The question as posed is far too general 
and is not of much interest unless the rv's: X; are suitably restricted. For 
example, if we take X, with df F and X; Хз, --- to be 0 with probability 1, 
choosing A, = 0 and B, = 1 leads to F as the limit df. 

We recall (corollary to Theorem 2 3 20) that, if Xj, X», ---, X, are iid rv's 
with common law @(1,-0), then 1^! S, is also (1, 0). VENE if Xy X, + 
X, are iid (0, 1) rv's then n^! 5, is also W(0, 1) (Corollary 2to Theorem 
5.3.25). We note thus that for certain sequences of ү 's there exist sequences 
A, and B, > 0, B, > co, such that By S, — A) Y. In the Cauchy case 
B, = n, A, = 0, and in the normal case B, = п!‘ , А, = 0. Moreover, we 
see that Cauchy and normal distributions appear as limiting distributions 

' —in these two cases, because of the reproductive nature of the distributions. 
Cauchy and normal distributions are examples of stable distributions. 


Definition 1. Let Ху, X; be iid nondegefierate rv's with common df Е. Let 
ai, a be any positive constants. We say that F is stable if there exist con- 
stants A and B (depending on а, az) such that the rv В (aj X + а;Х; — A) 
also has the df F. 


Let Xj, X», --- be iid rv's with common df F. We remark without proof 
(see Loéve [71], 327) that only stable distributions occur as limits. To make 
this statement more precise we make the following definition, 


Definition 2. Let X;, X», =- be iid rv's with common df F. We say that F 
belongs to the domain of attraction of a.distribution V if theré-exist norming 
constants B, > 0 and centering constants A, such that, as л -» oo, 


(1) P(B,XS, "e 4) < x} Sv, 


at all continuity points x of V. 


—— 
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In view of the statement after Definition 1, we see that only stable distri- 
butions possess domains of attraction. From Definition 1 we also note that 
each stable law belongs to its own domain of attraction. The study of stable 
distributions is beyond the scope of this book. We shall restrict ourselves to 
seeking conditions under which the limit law V is the normal distribution. 
The importance of the normal distribution in statistics is due largely to the 
fact that a wide class of distributions F belongs to the domain of attraction 
of the normal law. Let us consider some examples. 


Example 1. Let X, X, ---, X, be iid КТ; р) ту. Let 


&- EX. Ae = Ебу m mp S By туа, (pd p) 
Then 


M,(t) = E exp ker 
dii EDU 
zel uai i t^ rro] 
q=1>p, 
-=(-У rm 2] 


[rez er 
It follows from Lemma 6.5.1 that 
М) ^ е. asm co, 
and since e/? is the mgf of an J/(0, 1) rv, we have by the continuity theorem 


Ба пр bazki 22 
rÍ Mapa * Jiz git dt for all хє 2. 


Example 2. Let Xj, Ж, ---, X, be iid 5*1) гу. Then S, ~ y(n), ES, = n 
and var ($„) = 2n. Also let Z, = (S, — n) / 4/2n ; then 


te A п 2t \-"/2 Е 
mef - „/3)(: -= y) à 2t < /2n, m 


- [exp(r4/2) - d rs ө, [з i< ae 
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Using Taylor’s approximation,we get 
А) а GT semen (2). 
where < 0, < t4/(2/n). It follows that 
р ! Lm ( Ў is ^ gy" 


where 


Qn) = En (5 2-15 apo as пә oo, 


for every fixed т. We have from Lemma 6.5.1 that M,(t) > e? as n — со 
for all real 1; and it follows that 2, ^ Z, where Z is (0, 1). 


These examples suggest that, if we take iid rv's with finite variance, and 
take 4, = ES, B, = 4/var (55); then BUMS, — A,)%Z, where Z is 
(0, 1). This is the central limit result, which we now prove. The reader 

_ Should note that in both Examples] and 2 we used more than just the 
existence of E|X|?. Indeed, the mgf exists and hence moments of all 
order exist, The existence of mgf is not a Necessary condition. 


Theorem 1t (Lindeberg—the Central Limit Theorem) Let X, Xp, +++ be 
' independent nondegenerate rv's with distribution functions F}, Fz ·.., Assume 
that EX, = дь, var (Xj) = c? and write 
з оа 02, 
If F, are absolutely continuous with pdf F,’, assume that the relation 
Q) lim | $ J C-i оао 


ЖЫЛКЫНЫ и >65 
holds for all e > 0. If Y, are of the discrete type with jump points x, and 
jumps ру (= 1, 2, +), suppose that the rélation 

ln ib SN уг: 
(3) lim si i i P «Qu Hi) Pu = 0 


holds for every є > 0. Then the distribution of the normalized sum 


@ R 1c +H tet X= à Ss 200%) 


3 5, DA 


ui NERS SP 
T The reader may omit the proof of Theorem 1 on the first reading. The corollary to 
Theorem 1 will be used’ frequently in the rest of the book, however, and the ‘reader 
should familiarize himself with the application of this result. For an alternative proof 
of the corollary see Problem 4. 
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converges to the standard normal distribution. 
: inde 

Corollary. Let Xj, X;, --- be iid nondegenerate rv's with common df F, Let ~ 

both EX, = и and var (X) = o° be finite. Then i 


(5) St4Z аёп+®ф%; +? jn 
where Z is .A/(0, 1) and S* = (S,—ny)K(s /n). 


We first prove that the corollary follows from Theorem 1. If F isabsolutely 
continuous with pdf f, then 


u= E x fondi atm E x" f(x + и) dx, 
and condition (2) becomes 


li хуа [ лаж do 


па k=1 34 с pie 
l>eav n [ ЬУ V n 


as п > co. Since E|X|? < оо, this last condition holds and the corollary 
follows from Theorem 1. A similar argument applies in the discrete case. 


A conventional proof of Theorem 1 uses the notion of characteristic func- 
tions, which we do not study in this book. It is possible to imitate this proof 
using mgf's, but this requires the restrictive assumption of existence of all 
moments, Instead, we will prove Theorem 1 with the help of probability 
operators. The methods used here utilize the work of Trotter [131]. We refer 
the reader also to Feller [29], page 256, Lukacs [77], page 213, and Rényi 
[98], page 515. . 

Let Сз be the class of all uniformly bounded and uniformly continuous 
functions defined on 2 that can be differentiated three times and whose first, _ 
second, and third derivatives are also uniformly bounded and uniformly 
continuous on 7. For fe C; let us write i 


6) ПАТ = sup TAG 


and call | / | the norm of f- 
The following result is easy to prove; 


Lemma 1. f,geCs and ae 2. i As 
@ f+ geCy and aye Gy 
di. [у+ el s Melo lel: 

(ш) lafis lal Fl 


——e = 
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Definition 3. Let A be a transformation that assigns to each fe C, a 
function Af € Сз. We say that A is a linear operator if it satisfies: 
G А7 + g) = Af + Ag for all f гє Сз; 
(i) A(af) = aAfforf e C; and aeg; 
and 


(iii) there exists a positive number K such that AF | < K | | holds for 
all fe Cs. і 


A is called a contraction operator if (iii) holds for К = 1. 
For two operators A and В we define А + Band AB by the relations 
(A+ B)f=Af+Bf and | (AB)f - A(Bf). 


Definition 4. Let F be the df of an tv Х. With F we associate an operator 1 
Ак as follows. If F has pdf F’, then 


0 Af = Fife» FO) dy: for all fe Cy 
If F is a df with jump points x; and jumps p;, then 
(8) VR E р(х + x) огай fe Cs. 


The operator Ap is called the probability operator associated with F, 


Lemma 2. The probability operator associated with an arbitrary df F is 
à linear contraction operator. 4 


Proof. The proof is left as an exercise. 


Lemma 3, Let F and G be апу two df's. Then Ap and Aç commute; and 
ApAc = Ay, where Н is the conyolution df, H = F + G. 


Proof. The proof is left as an exercise. 


Lemma 4. Let Xy, X», =, X, be independent rv's with df’s Е, Fo, +++, Е,, 
respectively. Also, let S, = 2, Xj and let Н be the df of S,. Then 
Н = F; + F; + -+ + F,, and the associated probability operator Ар is given 
by 

(9) Ay = Ap Ar, Ap, 

where Ap, is the probability operator associated with Fp j = 1, 2, -—., n. 
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Lemma 5. Let F and С be any df's. Then 
(10) |4rAcf | s |4c£ | (огай fe Cs. 
Lemma 5 follows since Ар is a contraction operator. 


Lemma 6. Let Uj, U2, =, О, and Vi, V3, ---, V, be probability operators. 
Then i 


(11) UU; --- Unf — VV; Vanf | < &1%7- v,f ||- 
In particular, © 
(12) [un ии | s nus- Vs. 


Proof. For arbitrary linear operators we have the obvious identity 
л 
U,U2 = U, — Ёз V, = È UiUs---U, (Uy — Vi) i e Vie 


By triangular inequality and from the fact that the U's and V's are contrac- 
tion operators, result (11) follows. f 


Proof s of Theorem 1. Without loss of generality let us assume that д, = 0 
and s = 1. Let U, be the probability operator associated with F,, and V, 
be the probability operator associated with the df of an #(0, с?) rv. Let F* 
be the df of 5% = Уу? Ху. Then 


Apt = А, = 0. Um 


and the operator B associated with 
w= o(a) da): o) 


B= Va, 


is given by 


Here and subsequently we have written i 
о; | Cb wager” n 
Ф(х) = Xx f A dt. 
By Lemma 6, for if € C3, 
(13) 14:7 вл s у T Uf VS ^ 
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. We will show that 


с) |AF=Bf|| >O “as n> oo. 
By Taylor’s formula 2 
2 
(15) LH 3) =F) + rf) + FS + буу), 
2 
46) Је +) fe) + у/'(х) * 7 S'E) + eps + 65), 


_ where 0 < 0; < 1 and0 < 0: < Т, and 0,, 0; depend on x and y. 
А In the following we will assume that F, is absolutely continuous. Now 
for e > 0, we have 


US =S лоо) о) f f+ Fw. 
$ ? yl» j 
Substituting (16) into the first and (15) into the second integral, we obtain 


Uf =f) +41") of 1 f PSE + e» Fi) dy 


: PES 
an + | PU + 0.9) -FONRO dy. | 
{ lyl>e | 
| Letting M; = sup | "(x)| and M; = sup |f "(x)|, we get | 
ШУМ, Ud d | 
Q9 л) - $a FC] s e e M, f у® Еду) ду. 


туе 


Also, 
VS = aye f. fa te pa 
оу? A co ле kh 
so that, on using (16), we have 
(19) ло) —-% уо] < Mag? 
Y б > s 
"It follows from (18) and (19) that 


2 ; par 
[uf - Vas] s Moe + м, [ PE) dy + Мз д, 


>= 


E leas nels мй [oret Б 


Si 
2 lyl>e 
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i-f Еу) dy + [> Fy) dy < fy FO) dy + ғ, 
lyl e lylse lyl>e 

and, since 27; a, ux], 

- 3 2 $ 2 F 172 

Ў 0? < шахо,<[є + f> Fily) dy)“. 

1 1sksn 1 

ly >g 

It follows from (20) and condition (2) that 


lim È [Uf - MS = 0 
and hence that (14) holds. In view of the definition of the norm, this implies 
that 
(21) tim {7 укку) dy = [7 se DDO) d. 


for any x (indeed, uniformly in x), fe C3. ' 1 

Let ғ > 0 and consider a function f, є Сз with these properties: f,(x)=1 
for x € 0, and = O for x >, and f, is monotonically decreasing. in (0, e). 
Such a function exists. For example, take 


1 ifx <0, 
SAX) = (п — (xe yn if O's x S's, 
0 Их >. 
Then o's j T. yy qr 
«-x«92j 1e «2904 zwW-2, xem 
Similarly, yat 


Fy- xoz | л) FrO)ayz Р - х) хей. 
Using (21); we conclude that 
fara вс. fx + y) FEO) dy | 
йыла IM Me DADS BAGEL т | 


and 
23) gene ye ant pU 
: Б T a | t à 
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for all real x and arbitrary є > 0. Rewriting (23) as 
(24) й lim Р x) > O(— x — е), 
and combining (22) and (24), we get 
| Wx e)s lim FN x) sin FN x) < O x +2) 
for all x e 2 and e > 0. Since c is arbitrary, we see that 
lim Ру — х) = OC — х), XER, 


as asserted. 
Exactly similar considerations apply to the discrete case, and the proof is 
complete. 


- Remark 1. } Feller [27] has shown that Lindeberg's condition (2) or (3) is 
necessary as well, in the following sense: If Xj, X; -.- are independent rv's 
with finite standard deviations, and if Е, is the df of X,, then 


85... le fle =n 
and 
Q6) Jim P( max |X, = EQG)| > ev/var(S,)} = 0 


hold if and only if (2) or (3) is fulfilled for every ғ > 0. 
If (26) holds, we say that the rv's X, — EX, are infinitesimal. Relation 
(26) follows from (2) or (3), since 
* Mi 
P{ max |X, — ЕХ] > ex/var(S,)} 


s E Pix — ЕХ, | > evvar(S)) 


7 


1 зы, 
E var (Sa к-дун T АЮ Fi dx, 


11-3 = 2 

22 5 3 po ү ы Hk). Prr- 

Thus condition (2) or (3) implies that the tv's (X, — E X,)|s, are infinites- 
imal, We will not prove here the necessity of Lindeberg's condition. 


The following converse to the corollary of Theorem 1 holds. 
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Theorem 2. Let Ху, Xp, ---, X, be iid rv's such that п !/5, has the same 
distribution for every n = 1, 2, ---. Then, if EX, = 0, var (X) = 1, the 
distribution of X, must be #(0, 1). 
Proof. Let F be the df of n '°5,. By the central limit theorem, 
lim P(n'? 5, < x} = Ф(х). 

Also, P(n !? S, < x} = F(x) for each n. It follows that we must have 
F(x) = dx). 
Example 3. Let Xj, X», - be iid rv's with common pmf 

P{X= k} =pl =p}. k-0Lh2-,0«p«hq-l-p. 
Then ЕХ = q/p, var (X) = q/p*. By the corollary to Theorem 1 we see that 


p Sale)» < x} Ox) © asm— co forall xe. 


Example 4. Let X;, Xz, --- be iid rv's with common B(a, 8) distribution. 
Then 


EX = —* and ar CX) epee SB sl 
«+В т) (a-- BY (к + B +1) 
By the corollary to Theorem 1, it follows that 
Ss — п[а | (a + 8)] L М 


И ЖИТТИ 7 PETE EE ааа. lade Д) 
v afn/[(a + В + 1Ха + py] 
where Z is W(0, 1). 


Example 5. Let X, X2, --+ be independent rv's such that X, is U( — deti 
Then EX, = 0, var Mp = (1/3)a;. Suppose that |а, < a and Xj d o 
as n > oo. Then 


| 2 р ls | +, 
zo f x Fi») dx < E: 2. a PER 
Ld NS Ixl Des, : 
que a $&var(X,) 
5 Р{|Х, = d 
* 52 à (x > es) < $us es? 
= a >0 asn> о. 
es ` 


If Xj? d; < co, then s; 1 A’, say, as п >œ: For fixed k, we can find 
&, Such that в,А < a, and then P{|X,| > essa} > Р > &A} > 0. For 
п > К, we have А 
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n e 
$5 f F(x) dx > за E P(X| > 45) 
"n 77 ins, Sa 


so that the Lindeberg condition does not hold. Indeed, if Xj, X5, --- ud 
independent rv's such that there exists a constant 4 with Р (qx <А} = 

for all n, the Lindeberg condition (2) or (3) is satisfied if s ә oo as n —,o0. 
"To see this, suppose that ss — оо. Since the X,’s are uniformly bounded, so 
are the.rv's X, — EX,. It follows that for every є > 0 we can find an N, such 
that, for n > Na, P([X, – ЕХ, < es, k = 1,2, -+,n} = 1. The Lindeberg 
condition follows immediately. The converse also holds, for, if lim, .., 
52 < oo and the Lindeberg condition holds, there exists a constant 
A < oo such that © — A’. For any fixed j, we can find an e > 0 such that 
P{|X; = ш> е s > 0. Then, for n> j, 


mls s KC aes Or 
Te ee КА 
> г? Ё, P(|X, — ш] > Esn} 


ze 2 р(х, ш\ > eA} 
> 0, 


wl 


and the Lindeberg condition does not hold. This contradiction shows that 
" — oo is also a necessary condition; that is, for à sequence of uniformly 
bounded independent rv's, a necessary and sufficient condition for the central 
limit theorem to hold is s? +00 as п 00. 


Example 6. Let X, X}: be independent rv's such that a, = E|X, "^? « со 
for some ô > 0 and aj + a; +: + a, = 0(52**). Then the Lindeberg 
condition is satisfied, and the central limit theorem holds. This result is 
; due to Lyapunov. We have 


2s f x! Ех) dx 


531 |>, 

1 z f 24 ру 
S3. x|“ F(x) dx 
caes Ё, Г" ro 

X a, 
= ag —-0  asn-o. 
n 


A similar argument applies in the discrete case. 


Remark 2, Let X, X, --- be a sequence of iid гуз. The condition that 


se 
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E|X E « © for the central limit theorem (corollary to Theorem 1) to hold 
is sufficient but not necessary. It is possible, however, to obtain a necessary 
and sufficient condition for the central limit result to ‘hold. We refer the 
reader to Feller [29], page 313, for details. > 


Example 7. It is known (see, for example, Rohatgi [101]) that, if X, Ж, --- 
is a sequence of iid rv's with common law SA (X) such that 


Pix] ox] = £93 >, 


where L is a function of slow variation (that is, L(cx)/L(x) + 1 as x — oo 
for all c > 0), then X; belong to the domain of attraction of the normal dis- 
tribution. Thus, for example, if X is an rv with pdf 

Јо) = 25" log|x| for |х| > 1, (х) = 0 for |x| < 1, 
then Sal, 2n log n) Z,where Z is Wi (0, 1). (See Feller [29], 260.) Note that 
E|X|? = оо. 


Remark 3, Both the central limit theorem(CLT) and the (weak) law of large 
numbers (WLLN) hold for a large class of sequences of rv's (X,)- If the 
{X,} are independent uniformly bounded rv’s, that is, if P(|X,| x М} — 1, 
the WLLN (Theorem 6.3.1) holds; the CLT holds provided that s © 
(Example 5). ! 

If the rv’s {X,} are iid, then the CLT is a stronger result than the WLLN 
in that the former provides an estimate of the probability P (|S, — ny|In 2 е). 
Indeed, à 


P(|S; пу > ney = pS ли > Eva} 
&1-P(Z|s 5 ут), 


where Z is (0, 1), and the law of large number follows. On the other 
hand, we note that the WLLN does not require the existence of a second 
moment. 


Remark 4: lf {X,}_ are independent tv's, it is quite possible that the CLT. 
may apply to the X,'s, but not the WLLN. =: l 


Example 8 (Feller [28], 255). - Let {X,} be independent rv's with pmf 
PE Еу PS RES FOR SY, 2, a 
Then EX, = 0, var (X,) = k”. Also let А > 0; then 


PiS я 2 и од a gn i 
п-б ех spore 


It follows that, if 0 < A< 1, s,[n — 0, and by Corollary 2:to Theorem 6.3.1 
the WLLN holds. Now K^ « 7, so that the sim Dio y Уа, X; Pu Will 
be nonzero if n^ > es, a e[n ^? | (2А + 1)]- It follows that, as long as 
п> (24+ 1), 


e X X x4 Pu = 0 
P k=1 [RI >En 
and the Lindeberg condition holds. Thus the CLT holds for А > 0. This 
means that 
| AFI ванг 
Pa < 21 s, < b) |“. 
. Thus 


dt 


E DL 5; T n р een 
| VIAL п ух 1 a VIr 


and the WLLN cannot hold for А > 4. 


We conclude this section with some remarks concerning the application of 
the CLT. Let X;, X, --- be iid rv's with common mean y and variance 02. 
Let us write 

zi = Se СШ 
тул 
and let zj, z; be two arbitrary real numbers with zy « z;. If F, is the df of Z,, 
then 


lim P(z, < Z, < 22} = lim [F,(22) — ЕХ) 


that is, 


VER р R sein ips 
OE lim Р(д OV n+ пи «S, олуи + np} PU fie £? dt, 


РА 


a 


-It follows that the rv S, = X} _, X, is asymptotically normally distributed 


. with mean mp and variance no?. Equivalently, the rv n !S, is asymptotically 


Ms c" |n). This result is of great importance in statistics. 


Example 9. Let Xi, Xa -+-, X, be iid (1, p) rv's. Then S, is asymptotically 
M (np, np (1 — p)). Thus for large enough л. we can estimate P(S, < x} by 


THE CENTRAL LIMIT THEOREM 293 


fey e cease ea 
Vid = p) 3 ALS y np(l — D) 
In practice, n > 20 suffices. In particular, if n = 25, p = 4, then 
TEET i09. 
P{S, < 12} = р[®-— 25 < 53 
=, P{Z < —, 2} = 421, 
where Z is 4 (0,1). Li по Br 


A somewhat better approximation results if we apiy the continuity correc- 
tion. If X is ап integer-valued rv, then P(xi < X < хг), where x, and x; 
are integers, is exactly the same as P{x, — 1 < X « x; + 4 }.This amounts 
to making the discrete space of values of X continuous by considering inter- 
vals of length 1 with midpoints at integers. 

In Example 9, we have 


P{S, < 12} = P{S, < 12. s. P(Z < 0}= .50, 


which is the exact probability of at most 12 successes in 25 trials with a fair 
Coin. 


Example 10. Let X, X» --- be iid P(A) rv's. Then S, is approximately 
(nA, nd) for large n. If Л = 1, n = 64, then 
P(50« S, x 80} = P(5l.« S, « 80) 
= P{50.5 < S,.« 80.5) 
= 13.5 16.5) ir 
=р5 82216 a 
= Р{— 1.69 < 2 «.2.06). 
= .9348, 


PROBLEMS 6.6 


1l. Let (X,) Бега sequence of independent rv's with the sowing HI 
In each case, does the Lindeberg condition hold? 


(a) P(X, = + (1/27) = 3. 
(b) P(X, = £271) = 1/2753, P(X, = 0) = 1 — (1/2792). 
(с) P(X, = +1) -(1—2-7)/2, P(X, = x2-"] — 1/2041, 


(d) | {X,} is a sequence of independent Poisson rv's with dau Àj. nz1,2,-- 
such that 77.1 A, — co. 
(e) P(X, = £2") = 
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2. Let X, X» --- be iid rv's with mean 0, variance 1, and EX} < œ. Find the 
limiting distribution of 
PW Х,Х, + ХХ, t + Xan Xan 

Zan ey Xr o X) 
3. Let X,, X; --- be iid rv's with mean a and variance 0°, and let Y, Y; ·-· be 
iid rv’s with mean Д( # 0) and variance т?. Find the limiting distribution of „= 
Мп(Х„ — a)/ Y,, where X, = n-! 7-1 X; and Y, = n! Xa Үр 
4. Let (X,) bea sequence of iid rv’s with mean и and variance g?. Let M(t) 
be the common mgf that is assumed to exist for |t| < h, h > 0. Also, let Y, = 
(S, — n)a y'n), and write M,(r) for the mgf of Ү„. Use the continuity theorem 
to show that 0 


M,(t) > е. as п -+ оо for all real t. 


[Hint: The existence of M(t) for |t| < h implies the existence of moments of all 
“orders. Expand M,(t) by Taylor's theorem with remainder. ] 

5. Let ХХ, bea sequence of iid rv’s with common mean и and variance 
g. Also, let X =n Dl- Xpand S? = (n – 1)-% Df. (X; — Xy. Show that 
Vn(X-p)/ SZ, where Z ~ W(0, 1). 

6., Let Xn Xn: Ху be iid rvs with mean 75 and variance 225, Use Chebyshev's 
inequality to calculate the probability that the sample mean will not differ from 


the population mean by more than 6. Then use the CLT to calculate the same 
probability, and compare your results. 


Let X ~ b(n, 0). Use the CLT to find n such that P,(X > n/2) > 1 — a. In 
particular, let а = .10 and 0 = .45. Calculate л, satisfying Р(Х > n/2} = .90. 


8. Let X, X, - „Ай be iid P(A) rv's, where А = .02. Let S = Siw = Din X, 
Use the central unit result to evaluate P(S 2 3), and compare your result to 
the exact probability of the event S > 3, 


9. Let X;, X2,-++, Xa beiid rvs with mean 54 and variance 225. Use Chebychev's 
inequality to find the possible difference between the sample mean and the popula- 
tion mean with a probability of at least .75. Also use the CLT to do the same. 


10. Let X, X; -- be a sequence of iid rv's^with mean и and variance 0°, and 
assume that EX; < oo. Write V, = Din (X, — uy. Find the centering and norm- 
ing constants A, and B, such that B. (V, = А,) +, Z, where Z is W(0, 1). 


ll. From an urn containing 10 identical balls numbered,0 through 9, л balls are 
drawn with replacement. 


(a) What does the law of large numbers tell you about the appearance of 0's in 
the drawings? * 

(b) How many drawings must be made in order that, with probability at least .95, 
the relative frequency of the ogcurrence of 0's will be between .09 and .11? 

(c) Use the CLT to find the probability that among the numbers thus chosen 
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the number 5 will appear between (n — 34/n)/10 and (л + 34/7)/10 times 
(inclusive) if (i) л = 25, (ii) n = 100. 
12. Let X, X, =, X, be iid гуз with EX, = 0 and EX, = g? < oo. Let 
X = X X;[n, and for any positive real number e let P, „= P(X > е). Show that 


2, lone 
Pp = PR Vin’ as n> oo. 


[Hint : Use (5.3.68).] 
13. Prove Lemmas 1 through 4. 


CHAPTER 7 


Sample Moments and 
Their Distributions 


7.1 INTRODUCTION 


In the preceding chapters we discussed fundamental ideas and techniques of 
probability theory. In this development we created a mathematical model of 
a random experiment by associating with it a sample space in which random 
events correspond to sets of a certain g-field. The notion of probability 
defined on this g-field corresponds to the notion of uncertainty in the 
outcome on any performance of the random experiment. 
^ In this chapter we begin the study of some problems of mathematical 
statistics. The methods of probability theory learned in preceding chapters 
will be used extensively in this study. 

Suppose that we seek information about some numerical characteristics of 
a collection of elements, called a population. For reasons of time or cost we may 
not wish or be able to study each individual element of the population. Our 
object is to draw conclusions about the unknown population characteristics 
on the basis of information on some characteristics of a suitably selected 
sample. Formally, let X be a random variable which describes the population 
under investigation, and let F be the df of X. There are two possibilities. Either 
X has a df F, with a known functional form(except perhaps for the parameter 
6, which may be a vector), or X has a df F about which we know nothing 
(except perhaps that F is, say, absolutely continuous). In the former case let 
Ө be the set of possible values of the unknown parameter 6. Then the job of 
a statistician is to decide, on the basis of a suitably selected sample, which 
member or members of the family {Fp 0€ Ө} can represent the df of X. 
Problems of this type are called problems of parametric statistical in ference 
and will be the subject of investigation in Chapters 8 through 12. The case 
in which nothing is known about the functional form of the df F of X is 
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clearly much more difficult. Inference problems of this type fall into the 
domain of nonparametric statistics and will be discussed in Chapter 13, _ 

To be sure, the scope of statistical ве is much. wider than: the 
statistical inference problems discussed in this book. Statisticians, for example, 
deal with problems of planning and designing experiments, of collecting 
information, and of deciding how best the collected information should be 
used. However, here we concern ourselves only with the. best. methods of 
making inferences about probability distributions. 

In Section 2 of this chapter we introduce the notions of a (simple) random 
sample and sample statistics. In Section.3 we study sample moments and 
their distributions, and in Section 4 we consider some important distributions 
that arise in sampling from a normal population. Sections 5 and 6.are devoted 
to the study of sampling from univariate and bivariate normal distributions. 


7.2 RANDOM SAMPLING 


Consider a statistical experiment that culminates in outcomes x, which aré 
the:values assumed by an rv X. Let F be the df of X. In practice, F will not 
be completely known, that is, one or more parameters associated with F will 
be unknown. The job of a statistician is to estimate these unknown parameters 
or to test the validity of certain statements about them. He can obtain n 
independent observations on X. This means that he observes и values ху, 
Xo, +, X, assumed by the rv X. Each x; can be regarded as the value assumed 
by an rv X; i = 1, 2, ---, n, where Xj, Xo, ---, X, are independent rv's with 
common df F. The observed values (x, x2, +++, x,) are then values assumed 
by (Xy, Xo, «++, X,). The set (Xj, Xo, --+, X,} is then a sample of size n taken 
from a population distribution F. The set of n values xy, x», --*, x, is called a 
realization of the sàmple. Note that the possible values of the random vector 
(Xi; X», +, Xn) can be regarded 4 points in #,, which may be called the 
sample space. In practice one observes not x, X», ·-:, x, but some function 
Лх хь ++, х„). Then f(x, Xz, x,) are values assumed by the rv f(X, 
Xy ny X). : 
Let us now formalize these concepts. 


Definition 1. Let X be an rv with df F, and let Xj, Xo, ---, X, be iid rv's 
with common df F. Then the collection Xi; Xz, ---, X, i$ known as a random 
sample of size п from tbe df For simply аз n independent observations on X. 


If Xy, X», ---, X, is a random sample from F, the joint df is given by 


а) Fy xs x, = f FD. 
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Definition 2. Let X, Xp, ---, Y, ben independent observations on an rv Х, 
and let f: 22, — @,be а Borel-measurable function: Then the rv f(X, X;, 
++, X,,) is called a (sample) statistic provided that it is not a function of any 
unknown parameter(s). , 


Two of the most commonly used statistics are defined as follows. 


Definition 3. Let X, X, ---, Y, be a random sample from a distribution 
function F. Then the statistic 


Vu. Lema? i Xs 
(2) X-n dapes 
is called the sample mean, and the statistic 
p » X2 = пў? 
2 I LX 
© Ойы! nol ЖИБЕРЕ | 


is called the sample variance. 


Remark 1. Whenever the word “sample” is used subsequently, it will mean 
"random sample." 


Remark 2. It should be remembered that sample statistics X, S^(and others 
that we will define later on) are random variables, while the population 
parameters и, g’, and so on are fixed constants that may be unknown. 


‚ Remark 3. In (3) we divide by n — 1 rather than m. The reason for this 
Will become clear in the next section. 


Remark 4. Other frequently occurring examples of statistics are sample 
order statistics Xq), Хо), ++, X (o and their functions, as. well as sample 
moments, which will be studied in the next section. 


Example І. Let Y ~ (І, P), where p is possibly unknown. The df of X is 
giver by > 
F(x) = pex — 1) + (1 = р) ex), хє 2. 


Suppose that буе independent observations on X are 0, 1, 1, 1, 0. Then 
0, 1, 1, 1, 0 is a realization of the sample Xj; Xz, +>, Ху. The sample mean 
is 


soot ADU) | 
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which is the value assumed by the rv X. The sample variance is 


L a Geax _ 267 + 304) L 
PRISE 000 = 8 


which is the value assumed by the гу S°. 


Example 2. Let X ~ MN (u, o°), where u is known but а? is unknown. 
Let Y, Xo =, X, be a sample from (u, a^). Then, according to our 
definition, У)", X;/o* is not a statistic. 

Suppose that five observations on X are —.864, .561, 2,355, ..582; 
—.774. Then the sample mean is .372, and the sample variance is 1.648. 


PROBLEMS 7.2 


1. Let X be a КІ, 1) rv, and consider all possible random samples of size 3 on X. 
Compute X and S? for each of the eight samples, and also compute the pmf's of X 
and S?, 

2. A die is rolled, Let .X be the face value that turns up, and X;, X; be two inde- 
pendent observations on X. Compute the pmf of X. 


3. Let X» Xp =, X, be a sample from some population. Show that 


yp (DS 
max |Х; Pose ern 


unless either all the п observations are equal or exactly л — 1 of the X/'s are equal. 
(Samuelson [109]) 


7.3 SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 


Let Xi, Ху, +, X, be a sample from a population df Е. In this section we 
consider some commonly used sample characteristics and their distributions. 


Definition 1. Let F*(x) = n! Xj, Ax — Xj. Then nF*(x) is the number 
‘of Xs (1 € k € n) that are < x. F*(x) is called thé sample (or empirical) 
distribution function. 

If Xy, Хау, is Xo) is the order statistic for Xj, X, -:+ Xp then clearly 


f; if x < Xo, 
(0 FWE iXosx«Xen (2 27D 
1 if x > Xw: И 


For fixed but otherwise arbitrary хє 2, F*(x) itself is an rv. The following 
result is immediate. 
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Theorem 1. The rv F7(x) has the probability function 

RO NRI HAS oia 
@ P(r A)(I E- FO, john 


with mean 


(3) ЕЕ*(х) = F(x) 
and variance 
@ ents смара) FOU КОО], 


Proof. Since «(x — Xj, j = 1, 2, ..., n, are iid гуз, each with pmf 
P{e(x - X) = 1) = P(x — X, > 0} = F(x) 
апа 


P{e(x - X;) = 0} = 1 — F(x), 


their sum nF*(x) is а b(n, p) rv, where P = F(x). Relations (2), (3); and (4) 
follow immediately. 


Corollary 1. 
Е*(х) 5 F(x).. asin oo; 
Corollary 2. 


` Vn[FfG) — FO] | Ў 
кыен Nace LEIS ee 
VFGU = Foy 0529 


where Z is J^ (0, 1). 


Corollary 1 follows from the WLLN, and Corollary 2 from the CLT. 
The convergence in Corollary 1 is for each value of x. It is possible to 


make a probability statement simultaneously for all х; We state the result 
without proof. 


. Theorem 2 (Glivenko-Cantelii Theorem) F 4 (x) converges uniformly to F(x), 
that is, for € > 0, 
аР F*(x) Fi елү 
lim { sup | n(x) — F(x)| > е) = 0. 
Proof. Fora proof of Théorem 2 we refer to Fisz [32], 


page 391. 


We next consider some typical values of the df F(x), called sample 


ee. ad 


SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 301 


statistics. Since Е*(х) has jump points X;, j = 1, 2, ---, п, it is clear that all 
moments of F* exist. Let us write 


(5) апі P 
j=l 


for the moment of order К about 0. Here a, will be called the sample moment 
of order k. In this notation 


(6) ар= пі Ё 0-1. d 

The sample central moments ate defined by 

(7) ъ= т! Ё (X — аў = т! р (Ceo Oe 
Clearly, 


b, =0 < jandi = (s. 


As mentioned earlier, we do not. call b; the sample. variance. S? will be 
referred to.as the sample variance for reasons that will subsequently become 
clear. We have 
®) "ILLE 

For the mgf of df F*, we have 


(9) мє) = т! È ёл 


Similar definitions are made for sample Mem tt of bivariate and multi- 
variate distributions. For'example, if (Xy, Yi), (Xa, Y2), --:, (Xn Yn) is a sample 
from a bivariate distribution, we write 
(10) X E X, Ys iE Y; 
for the two sample means, Eno for the second-order sample central moments 
we write 


ba =n EG, be =D O- YY, 
Ке! je 


а) е 
by -n p (X; - X) (Y; – Ү). 


Оше again we write 


(D) Si -( 21у $o- m sj-a-y' Š O; – YY 
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for the two sample variances, and for the sample covariance we use the 
quantity 


(13) Su = (m= "£05 - 50, - Р). 
In particular, the sample correlation coefficient is defined by 
b. 8; 
ee а LDIE 
09 BO VUE. SS 


It can be shown (Problem 4) that [В| < 1, the extreme values: +1 can 
occur only when all sample points (X;, Y;), --:, (X,, У,) lie опа straight 
line. 

One can also work out the formulas for the two lines of regression. Thus 
the line of regression of Y on X can be shown to be 


(15) У-Ү к (х— Ху, 
1 


and is called {һе sample (linear) regression of Y оп X. RS/S, is called the 
sample regression coefficient of Y on X. Similar discussion applies to the 
sample regression line of X on Y. 

The sample quantiles are defined in a similar manner. Thus, if 0 <р « 1, 
the sample quantile of order p, denoted by Zp is the order statistic X;,), where 

86 Аа if np is an integer, 
[np + 1] if np is not an integer. 

As usual, [x] is the largest integer < x. Note that, if mp is an integer, we 
can take any value between X(,5 and Хуу as the pth sample quantile. 
Thus, if p = + and n is even, we can take any value between X;,,/2) and 
X n/2)+1» the two middle values, as the median. It is customary to take the 
average. Thus the sample median is defined as 


Хокро if n is odd, 
ч? de Хаа t Xen DAD if mis even. 
Note that 
n —~(n+1 
[3 У ] (es ) 
if n is odd. 


Next we consider the moments of sample characteristics. In the following 
we write EX* = m, and E(X — HY ='ur for the kth-order population 
moments. Whenever we use m, (or Да), it will be assumed to exist. Also, c^ 
represents the population variance. 
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Theorem 3. Let X;, X2, ---, X, be a sample from a population with df F. 
Then 


(17) EX =p, 
(18) var (X) = - 

Z 3(л— 1 + (п — 1)(л — Dp? 
(19) E(Xy = т + Xn ne (n — 1)(n = Dp 
and 


m, + 4n — 1)тзи + 6(n — 1) (n — 2)ти? + 3(n — 1)т? 
3 
n 
+(@—1)(л—2)(л — 34 


(20) Е(Х) = 


Proof. In view of Theorems 4.6.1 and 4.6.5, it suffices to prove (19) and 
(20). We have М 


and (19) follows. Similarly 
Ton 4 
(£x) -(& x) (Exi agn + BEE xx) 
-Rreigxxpesgaun +6 у ‚Ххх, 
jF 
È XiX; X, Xy 
ІФФФ 


апа (20) follows. 


Theorem 4. For the third and fourth central moments of X, we have . 


(21) п) = бъ 

(п = D 
(22) p(X) = E * Ass не 
Proof. We have nic к 


аб) = BCE - jy = me = ao s 


"$i 
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31280 Sey < 
^ л 4 
wh) = E = pt = S ЕЎ (x, - wh 


BREE dt 3 4, /4\ 1 QU OS Re} 
= A Е®-н*+( у кх, - war - wh 


_ 4 3Xn—1) 2 
nudus Dem 


Theorem 5. For the moments of b», we have 


Q3) Еф) = = De". 
0) var (iy) = 04. Au SAT А34, 
025) E(bs) = e-ne Fu 


2 
Q6)  E(b)- (n — 1) we 3n + 3) rae 3(n = 0з - 3) ? 


Proof. We have 
выт - ae (x) 
(emere ap xa) 


= Me {лт +. п — 1p) 


= (=) о) 


Now 


ET [5 (aua унь АЎ. ; | 


Writing Y; = X; — д we see that EY; =/0, var (Y) = 0%, and EY! =i) ! 
We have d { 


-E| Eri + g rir- (рт r), 
еер + 5j. 
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It follows that 
п?ЕЬ; = nu, nn — 1) — 2 [nin — 1)0* + nu] 
Е dy Bnin = о + ny] 
=(n vio x) (n 29" i)" -yë (n = 0. 


Therefore 
var (Б) = Eb; — (Eb; 


„(к-тун eco cr HCM 


(„Ур Түлү EG C OR 
=(n 2+1) H+ @- 6-3 


as asserted. 
Relations (25) and (26) can be proved similarly. 


ai 


Corollary 1. ES? = ò. 
This is poene the reason why we call 5°, and not bn the sample variance. 


Corollary 2. © vat (55 St 22 zi zi vp 18. 


The following result Байды a justification for our definition of sample 
covariance. 


Theorem 6. Let (Ху, Y1), (% Үз), --:, (Xm Yq) bea sample from a bivariate 
population with variances 61, e; and covariance ртүо. Then + 


(27) Е52 = 0, ES; =op and Ебі = 0010 
where 5°, 52, and Sj; are defined in (12) and (13). 


Proof. It follows from Corollary 1 to Theorem 5 that ES? = оү and 
ES? = оў. To prove that ES; = pois; we note that X, is independent 
of ХА + i) and ҮД # j). We have 


e SESS [E Ж уе x) qr - Y). 


Now 


E(X, - X) (ү; = Y) 
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i 
Ay . Xy UGLY, 
= #{җу,- x Ж-ү Xu, EXEX) 


- EXY - LuEXY (n — 1) EX EY]- sexy +(n— DEXEY] 
+1. [nEXY + n(n = 1) EX EY] 
n 


3 E (EXY — EX EY) 


and it follows that 
(n – DES, = ("= L) (EXY — EX EY), 
that is, 
ESy = EXY — EX EY = cov (X, Y) = poyon, 
as asserted. 


We next turn our attention to the distributions of sample characteristics. 
Several possibilities exist. If the exact sampling distribution is required, the 
method of transformation described in Section 4.4. can be used. Sometimes 
the technique of mgf can be applied. Thus, if Xj, Xp, =, X, is a random 
sample from a population distribution for which the mgf exists, the mgf of 
the sample mean X is given by 


28) мю) = fi Bet =| (LY, 
#=1 п 
where M is the mgf of the Population distribution. If М. '¢(t) has one of the 
known forms, it is possible to write the pdf of Y. Although this method has 
` the obvious drawback that it applies only to distributions for which all 
moments exist, we will see in Section 7.4 its effectiveness in the important 
case of sampling from a normal population. 


Example 1. Let X, X, --., X, bea sample from a G(a, 1) distribution. We 
‚УШ compute the pdf of X. 'We have 


E t | E Ley t 
2e = [м(2) "a -gpe mx 
so that X is a G(an, 1/n) variate. 


Example 2. Let X, Xa ---, X, be a randóm sample from a uniform distri- 
bution on (0, 1). Consider the geometric mean 
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x Yn 
x,-(hx) - 
=1 
We have log Y, = (1/n) X; log X; so that log Y, is the mean of log Xj, 


+, log X,- 
The common pdf of log X;, ---, log X, is 


лә = {2 if x « 0, 


otherwise, 
which is the. negative exponential distribution with parameter 8 — 1. We 
see that the mgf of log Y, is given by 


M(t) = Il Eet log Хуп — 1 


(+ any” 
and the pdf of log Y, is given by 
Я ”-1 e" 
f'G) - 9 х) 5, -0«x«0 Р 
, otherwise. А. 


It follows that Y, has pdf 


Won pl 
= {re X, Clog yy n0 «y <1, 
0, otherwise. 
Example 3 (Hogben [49]).. Let X, X», +, X, be a random sample from a 
Bernoulli distribution with parameter Ps 0 <р < 1. Let X be the sample 
бш апа S? be the Sample variance. We will find the pmf of ^. Note that 
= De X= De Pep a and that S, is b(n, p). Since 


(n — 1)$° =z Xx? - Xy 


= 5,59. 
п 
Sn — S,) 
n 


E UD е [а 
Spas: аьа] 
where [x] is the largest integer < x. Thus 
P{S? = 1) = P{nS, — 52 = i(n — i)) 


~P(s.-3)=(- 9) 


z 
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= P{S, = ior S, =n — i} 

= (Fea = ay «(Bea - py 

= (7) -»xa = e реа bet] 
If n is even, n = 2m, say, where т > 0 is an interger, and i = m, then 


In particular, if n= 7, S? = 0, +, 4 and } with probabilities 
{Р + (1 — ру}, 7p(1— p) {P + (1 — py), 217 1 р)? (р? + {1 — py, 
and 35p*(1 — р), respectively. 

If n = 6, then S^ — 0, 1, 4, and ў with probabilities {p° +(1 — p, | 
6p(1 — p) (p^ + (1 — p), 15p%(1 — р)? {p* + (1 — p)?}, and 407'(1 — ру, | 
respectively. | 

| 


We have ‘already considered the distribution of the sample quantiles in 
Section 4.5 and the distribution of the range Xa) — X, in Example 4.4.10. 
It can be shown, without much difficulty, that the distribution of the sample 
median is given by 


Ж. п! rip -r жу ES | 

(29) £)- (к DK - Dr [FON [1 Fy)" f(y) ifr = 3 | 
where F and fare the population df and pdf, respectively. If n = 2m, and | 
the median is taken as the average of X; and Холу, then | 
| 
| 


CAO) = тшу f, FOP orta = or" ray - уду. 


Example 4. Let X,, X, ++, X, be a random sample from U(0, 1). Then the 
integrand in (30) is positive for the intersection of the геріопѕ 0 < 2y — v <1 
and 0 « v « I. This gives (v/2) & y = (v+ 1)/2, y< vand 0 < v < 1. The 
shaded area in Fig. 1 gives the limits on the integral as 
a y«v«2y if0<y<i, 

y<v<l iff « yl. 


n particular, if m= 2, the pdf of the median, (Xo + Xa )/2, is given 
y : 


[xem WS if0<y<4, 
тт ificycl, 


0 otherwise. 
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ar 
2 


Fig. 1 

If the asymptotic distribution is required,either the central limit theorem 
or the theorem of Cramér (Theorem 6.2.15) can be used. 

Consider, for example, n independent observations Xj, Xo, «7, X; on an 
rv X. Let k be a positive integer, and assume that E|X|^ < co. Then each 
X} has mean m, апа variance. mj — (m; By the CLT it follows: that, if 
a, is the kth sample moment, then er 


(31) Fae = nA, 
where Z is ^ (0, 1). Thus a, is asymptotically normal (m, (тә, — тїп). 
In particular, X is asymptotically A (и, оп) ^ ^ Ide у 
In a similar manner; but with the help of a somewhat more difficult argu- 
ment, we can show that the sample central moment of order k is.also asymp- 
totically normally distributed. We refer to Cramér [18], page 365, for details. 
Indeed, we can show under mild: conditions that any sample characteristic 
based on (sample) moments is asymptotically normal with parameters that 
are identical with the corresponding population characteristic; see Cramér 
[18], pages 366-367. eee 
Example 5. Let Xp X; --- be iid Ñ (и, 0°). Also, let X be the sample mean, 
and S? the sample variance. Consider the rv ba 
sv nC le 


g Un і 
ан - Sle gas am ( 
where U, is .4^ (0, 1). In Section 7.5 we will determine the distribution of 
Т,. Here we use Cramer's theorem to show that vut 4 Z, where Z is M (0,1). 
From Example 6.3.4 we see that (n — 1)/n S? E g?. so that(S/o)?, 1.-It 
follows that Т, 2, Z. See Problem 6.6.5 for a more general result. 
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The following result gives the asymptotic distribution of the rth-order 
Statistic, 1 < r «m, in sampling from a population with "an absolutely 
continuous df F with pdf. fg 


Theorem 7. If X, denotes the rth-order statistic of a sample X; X, ..., 
X, from an absolutely continuous df F with pdf f, then 


12 
2) боШ EE Ev uiu 


` so that r/n remains fixed rin = p, where Z is 00, 1), and 3, is the unique 
solution of F(3y) = p (that is, 3p is the population quantile of order p assumed 
unique). 


Proof. For the proof we refer to Problem 10. 


Remark 1. Тһе sample quantile of order P, Zp is asymptotically 


" 1 p1-p) , 
a. “(nage a) 


where з, is the corresponding population quantile, and f is the pdf of the 


population distribution function. It also follows that Z, 22, 5. 


PROBLEMS 7.3 


l. Let Xi, X,, ..., X, be random sample from a df F, and let Fr (x) be the sample 
distribution function. Find cov (Ff (x), Ff (¥)) for fixed real numbers x, y. 


2. Let F7 be the empirical df of a random sample from df F. Show that 


wh 


* E 1 
P[irtco ғоз Bye) su Ei for alles o. 


aX; + b (a #0) and Vi=cY, + dic у 0), what is the sample correlation coeffi- 
cient between the U's and the V's? 


5.. Derive (15) for the line of regression of Y on X, based on the sample (ху, у;), 
(х» y), ---, (ns Ya). 


~ X, be a random sample from N (u, 0°). Compute the first four 
sample moments of ¥ about the origin and about the mean. Also compute the first 
four sample moments of S? about the mean. ` 
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7. Derive the pdf of the median as given in (29) and (30). ` 


8. Let Uas, Uc +++, Uc be the order statistic of a sample of size л from U(0,1). 
Compute E U (5 for any 1 < r € n and integer k(> 0). In particular, show that 


r s Е = rA) 
фр?! хАп@); e var (Иш). у FIR 0): 


Show also that the correlation coefficient between Uc and Uey forl<r<s<n 
is given by [r(n — s + 1)/s(n — r + 1)]!/?. 


9... Let. Xy, Xn +v, X, be a sample from an absolutely continuous df F with pdf f. 
Show that 


EUG = 


3 r rn —r +1) 1 
Ех (ттт) and var Kod Gee Ty 2) Wn DI 


[Hint: Let Y be an rv with mean и, and o be a Borel function such that Eg(Y) 
exists. Expand q(Y) about the point и by a Taylor series expansion, and use the 
fact that F(X.) = Um.) 
10. Prove Theorem 7. ^ 


[Hint : For any real и and o(> 0) compute the pdf of (Uc, — и)/т and show that 
the standardized. Ui, (Uc; = w/a, is asymptotically (0, 1) under the condi- 
tions of the theorem.] à 


11. Let X,, X», X, be n independent observations on X. Find the sampling dis- 
tribution of Y, the sample mean, if (a) X ~ P(A), (b) X ~ €(1, 0), (c) X ^ x*m). 


12. Let X;, X; ---, X, be a random sample from G(a, B). Let us write 
Y, = (X — a8) B v (aln), п = 1, 2, ix 


(a) Compute the first four moments of Y,, and compare them with the first four 
moments of the standard normal distribution. 


(b) Compute the coefficients of skewness a; and of kurtosis a, for the rv's Y,. 
(For definitions of as, а; see Problem 3.2.10.) х 


13. Let X, Xp = X, be a random sample from U[0, 1]. Also, let 2, = (X-.5) 
(1/12). Repeat Problem 12 for the sequence Z,. 


14. Let Xp X» =, X, be a random sample from P(A). Find var (5°), and com- 
pare it with var (Y). Note that EY = А = ES?. | 
(Hint: Use Problem 3.2.9.) М 


15. Prove (25) and (26). 


7.4. CHI-SQUARE, :-, AND F-DISTRIBUTIONS: 
EXACT SAMPLING DISTRIBUTIONS ~ 


In this section we investigate certain distributions that arise in sampling 
from a normal population. Let Хз, X» ---, X, be a'sample from (и c^). 
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Then we know that ¥ ~ (zu, o`n). Also, (/n(X — u)lo} is x1). We 
will determine the distribution of S^ in the next section. Here we mainly 
define chi-square, /-, and F-distributions and study their properties. Their 
importance will become evident in the next section and later in the testing 

of statistical hypotheses (Chapter 10). 

~ The first distribution of interest is the chi-square distribution, defined in 
Chapter 5 as а special case of the gamma distribution. Let п> 0 be an 
integer. Then G(n/2, 2) is a xn) rv. In view of Theorem 5.3.32and Corollary . 
2 to Theorem 5.3.4, the following result holds. 


Theorem 1. Let X}, X2, =+, X, be iid rv's, and let 5, = Z: X, Then 
() S, x9 % x, 

and 
(D X 200-2 z X? ~ yn). 


} If X has a chi-square distribution with m d.f., we write X ~ xn). We 
recall that, if X ~ у (п), its pdf is given by 


al e? if 
б ДЕА PA ШШ 
0 if x <0, 
` the mgf by 
(2) M(t)-(-—2)"^ fort < i, 
and the mean and the variance by 
@) М EX=n, уаг(Х) = 2n. 


Тһе x(n) distribution is tabulated for values of n = 1, 2, +++. Tables usually 
8o up to n = 30, since for n > 30 it is possible to use the normal approxi- * 
mation. р ) 


Theorem 2 (Fisher). If X ~ x(n), then 

(4) lim P(/2X - nT <2} = f Jz e dy, 

Proof. Since X is the sum of n iid х1) IV's, we apply the CLT to see that 
| : Х-п 
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is asymptotically normal, that is, 


1 2 X-n z gm 
lim P(Z, < 2} = lim Perg sr = | “Tye a 


“к xq-—n gc 

vet Sap rre) 
xn) п 2^—1 

Vin = [5 771} 


22-4 
= im Phy Xn) < n e:z/ 2n - 1 73 ) 
= lm POs) S253 Y мүш Те) 
= lim Piva < 2 € P i) 1). 


Remark 1. 1 follows that for z > 0 
P(y'(n) < z} = P{2y(n) < 22) = P( Vif nj S 22) 
| = P(Y (ну - 2871 S 22 - 2n — 1) 
t Ini 6/2 


Và ® 


| (5) 


We will write X. for the upper a percent point of the PA) distribution, 
that is, 
(9 РО) > x.) = а, 
and use approximation (4) or (5) for п > 30 since this, provides a better 


approximation than the CLT. Table 3 on page 652 gives the values of Yaw 
for some selected values of n and a. 


Example 1. Letn = 25. Then, from Table 3, page 652, 
P{7(25) < 34.382}, = 
Let us approximate this probability by normal tables using Theorem 2 and 
compare it with the approximation obtained from the CLT. From (5) we 
have ] 
P(x'(25) < 34. ih ж P(Z < 68.764 — E 
= P(Z < 129) = .9015, 


where Z is (0: 1)... 
To use the CLT, we see that Ey°(25) = 25, var 05) = = 50, so » that 


1 
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(025) — 25 34382 — 25 
.. PS) < 34382) = (209 sor 


& P(Z x 1.2) 
= .9066. 


Definition 1. Let X, X», -:-, X, be independent normal rv’s with EX, = ш 
and var (X) = 02,1 = 1, 2, —, п. Also, let Y =, У", X?/o” - The rv Y is 
n 


said to be a noncentral chi-square rv with noncentrality parameter PA 
and л d.f. We will write У ~ y°(n, 6), where à = у)” 17/0”. 


The pdf of a x, 6) rv can be shown to be 


ege exo{— 134 ye 
(7) ЛО, д) = S (дуў PG + 1/2) 


ЖОЛТО + ау ^9 
0, у<0, 
where ò = У)", ulo? - Letting à = 0, we see that (7) reduces to the central 
- x(n) pdf given in (1). 


Although pdf (7) looks formidable, its mgf can easily be evaluated. We 
have 


CREE Me 2 
M(t) = Eg"1*/e — f geo 
where X; ~ M (ш, 0”) - Thus 


ох = I txt (x; - M 
MN NEM 


where the integral exists for 1 < i. In the integrand we complete Squares, 
and after some simple algebra we obtain 
2,2 2 
Eel ilo a 1 { tu; ) 
Vici exp v — 21), E td. 


It follows that 


(8) M(t) = (1 = му"? exp( rig s t<}, 
and the mgf of a y"(n, 3) rv is therefore 
(9) M(t) = (1 — 207" e»( = = в), t<}. 


It is immediate that, if Xj, X, »--, X, are independent, X; ~ y? (n, бу), 
i= 12,55, k, then Xt, X; isz? nE D. 
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The mean and variance of (п, 2) are easy to calculate.We have 


ХЕМ È [var (X) + (ЕХ) 
= EEE E A pn eee 


and 


bb hi petes у 
var (Y) = var| 3 EL y [2 var (X;)] 
EXT (E ex! £ 2 Lear) 
yt {3 Got + бор? + yt) — Š (0° + dy) 
of 1& 
1 


q Qno* + 40° У, ш) = 2n + 45. 


We next turn our attention to Student's t-statistic, which arises quite 
naturally in sampling from a normal population. 


Definition 2. Let X ~ (0, 1) and Y ~ у (л), and let X and Y be independ- 
ent. Then the statistic 


x 
(10) T= 


is said to have a t-distribution with n d.f. and we write T ~ t(n). 


Theorem 3. The pdf of T defined in (10) is given by 


(11) fke) = "oa ii f[n)- 22, 00 лоо. 


Proof. The proof is left as an exercise. 


Remark 2. -For п = 1, T is a Cauchy rv. We will therefore assume that 
n > 1. For each п, we havea different pdf. Like the normal distribution, the 
t-distribution is important in the theory of statistics and hence is tabulated 
(Table 4, page 654), . 


Remark 3. The pdf f,(t) is symmetric in t, and f,(t) + 0 as t > + oo. For 
large m, the t-distribution is close to the normal distribution. Indeed, 
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s 


(+ P[n) "+072 —› ет as п — оо. For small n, however, T deviates 


considerably from the normal. In fact, 
P{|T| > t} = Р{|2| > to}, Z ~ N(0, 1), 


that is, there is more probability in the tail of the t-distribution than in the 
tail of the standard normal. In what follows we will write t„,«/2 for the 
value (Fig. 1) of T for which 


(12) P{|T| > tras} = а 
t(n) 
о/2 о/2 
Zh, aj2 = fn, 1- a2 0 LP 
Fig 21 


In Table 4 on page 654 positive values of 5, are tabulated for some 
selected values of n and a. Negative values may be obtained from symmetry, 


tal-a = — tna 


Example 2. Let п = 5. Then from Table 4 on page 654, we get 
15,025 = 2.571 and 4505 = 2.015. The corresponding values under the 
(0, 1) distribution аге z o25 = 1.96 and z.o5'= 1.65. For п = 30, 


ig, = 1.697 and ^ zs = 1.65. 


Theorem 4. Let X ~ t(n), n > 1. Then EX’ exists for г < п. In particular, 
ifr < n is odd, 


a3 ` EX’ =0, 

and if r < n is even, 

(14) X= well [tr + 52] Г[т- 7/2] | 
T(1/2) Г(п/2) 


Corollary. If n > 2, EX = 0 and ЕХ? = var (X) = n[(n-2). 


Remark 4. . If in Definition 2 we take X ~ е! a°), Y[o? ~ y(n), ana X 
and Y independent, 
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28 
ТҮП 
is said to have a noncentral t-distribution with parameter (also called 
noncentrality parameter) à = {а and d.f. m. The pdf of a noncentral 
t-distribution is given by 


(15) falt, 6) = Ta PA à r( E 2 +1 Neal w yt 


Putting 6 = 0 in (15), we get the pdf, (t) of a central t-distribution, given 
in (11). 


It is easy to show (Problem 3) that, if T has а noncentral t-distribution 
with n d.f. and noncentrality parameter ò, then 


(16) ET =ò [e= UE. a, 
and 


_ nl 05) 2 ön (Eo -D2 
(17) var (T) = зра x (Cap ). "o2 


Definition 3. Let Х and Y be independent y^ гуз with m and n d.f., re- 
spectively. The rv 


as = Yn 


is said to have an F-distribution with (m, п) d.f., and we write F ~ F(m, n). 


Theorem 5. The pdf of the F-statistic defined in (18) is given by 


Ee oW 
(9.5 107) = (+ mg) f>0, 


0, 7< 0. 


—(m+n)/2 
, 


Proof. The proof is left as an exercise. 


Remark 5. If X ~ F(m, n), then 1/X ~ Е(п, m). If we take m=1, then 
F = [t(n)]’, so that F(1, п) and #(n) have the same distribution. It also 
follows that, if Z is €(1, 0) [which is the same as ((D], 2 is F(1, 1). 
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Remark 6. As usual, we write F,,,,,q for the upper a percent point of the 
F(m, n) distribution, that is, 


(20) P(F(m, n) > Fyne} = а. 

From Remark 5, we have the following relation: 
Creal 

0) Faini = р. 


It therefore suffices to tabulate values of F that are > 1. This is done in 
Table 5 on page 655, where values of F,,,,,, аге listed for some selected 
values of m, n, and a. 


Theorem 6. Let X ~ F(m, n). Then, for k > 0, integral, 


Q) —E»- (2) Е ее т for n > 2k. 
In particular, 

(23) EX=—" 5, п>2, 

and 

(24) var (X) = 2 Qm+ 20-4) og 


тп — 2Y(n — 4)’ 


Proof. We have, for a positive integer k, 
/2-1 m —(m+n)/2 
[rm e mij 


A+Cm/2) өү 
25 (28 xem (р yya- 
(25) x): «f, Eg! 

where we have changed the variable to x — (m[n)f[1 + (т/п) 71". The. 

integral in the right side of (25) converges for (n/2) — k > 0 and diverges 
for (n/2) — k < 0. We have 

EX’ > Г(т + n)/2 m m/2 п k+ (m/2) m n 
, ТОТЫ Tub) (x). (+ ) 

as asserted. 

For k = 1, we get 


tu [2 
EX п 0) г ere п> 2. 
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Also, 
2 (п QD) om) + 1] 
e | ) тыз tama 
ny. тт+ 2) 
m] 6-3 - qt 
and 


(ny `тт+2) nv 

var Q0 = (5) 8-504 -(z) 
_ 2n(m + п – 2) 

тп — 2) (n — 4) 


Theorem 7. If X ~ F(m, n) then Y = 1/[1 + (m/n)X] is B(n/2, m/2). 
Consequently, for each x > 0, 


Tio es [төх] 


Proof. The proof is left as an exercise. 


n> 4. 


If in Definition 3 we take. X to be a noncentral 3? rv with n d.f. and 
noncentrality parameter б, we get a noncentral F rv. 


Definition 4. Let X ~ ут, 5)and Y ~ y(n), andlet X and Y be independ- 
ent. Then the rv 


X|m 
Үп 


is said to have а noncentral F-distribution with (т, п) d.f. and ЫК 
parameter à. 


(26) E 


It can be shown that the pdf of F defined in (26) is given by 

m pit 

P(n/2) 

.. Gmf]2y Г[(т + n)2 + j] 
#9 у Г[0т]2) + j] (nf +. ny mee 
1 if f» 0, 

0 if f< 0. 
Substituting à = 0, we get the central F(m, n) pdf given in (19): “ 

It is shown in Problem 2 that, if F has а noncentral F-distribution with 
(m, n) d.f. and noncentrality parameter 6, 


e айан 


(27) (07; m, n, 6) = 


320 SAMPLE MOMENTS AND THEIR DISTRIBUTIONS 


п(т + б) 
ЕЕ = шл mo = 2) n> 2, 
and р i 
(туг A ONE 5 [Un + 3) 4 (n—2)(m+26)], — n»4. 


(n – 4) (a — 


PROBLEMS 7.4 


ке 
үшүтөт 3 17s 
P, = {r(5)2 | fi е do, x > 0. 
Show that 
Ny 
n 
х< үер 


2. Let X ~ЗЕ(т, n, б). Find EX and var (X). 


3. Let T be a noncentral t-statistic with л d. f. and noncentrality parameter ô. 
Find ET and var (T). 


4. Let F~ F (m, n). Then 
B : n my! nm 
Ya LE > Вэ 5). 
Deduce that for x > 0. 
i EN 
P(F < x) = 1- Ply <(14™x) | 
5. Derive the pdf of an F-statistic with (m, n) d.f. 


6. Show that the square of a noncentral t-statistic is a noncentral F-statistic. 


7. UK sample of size 16 showed variance of 5.76. Find c. such that 
P(|X — и| < c) =.95, where X is the sample mean and џгіѕ the population 
mean. Assume that the sample comes from a normal population. 


8. A sample from a ‘normal population produced variance 4.0. Find the size of 
the sample if the sample mean deviates from the population mean by tior more 
than 2.0 with a probability of at least .95. 

9. Let X,, Xp, Хз, X,, X, be a sample from .47(0,4). Find P (Xj X7 2 5.75). 
10. Let vedi (61). Find P(x> 50}. 

ns eda F(m, п). The random variable Z = ; log F is known as Fisher's 
Z-statistic. Find the pdf of Z. 

12. Prove Theorem 1.- 


13. Prove Theorem 3. 
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14. Prove Theorem 4. 
15. Prove Theorem 5. 


7.5 THE DISTRIBUTION OF (X, 5?) IN SAMPLING FROM A NORMAL 
POPULATION 

LéUX X Ya be a sample from 9E o?), and write X = п! X^ , X, 
and 5? = ue 1)! X? , (X, — Xy - In this section we show that Х and 


S? are independent and derive the distribution of 52. More precisely, we 
prove the following important result. 


Theorem 1... Let Xj, Xz, ---, X, be iid W(u, 02) rv's. Then X and (X, — 
X, — X, +, X, — X) are independent. 


Proof. We compute the mgf of Y and X, — X, X; — X, -, X, — X as 
follows : v 


M(t, t, toy 7, ty) = Eexp (tX + (Xi — X) + (X5 — Y) +--+  t(X,— X) 
= E exp (£ t; X; — (E ty 1) x} 
- en [E x(u- tht +i) 
= a Tt exp {Жн Xu; au (sinere ї = п! Eu) 
A ti | ap [Elena »n 


L [гөр [+] 4 n ies z Ltt nt; — Du 
1 ui 
= exp [A [nt +n i aad 
= exp (ut) a[i (ne Jeu Eu — iy y. 
exp (ot дере oo nl 


= Mx()M y,— vacan 15; 7s ty) 
= M(t, 0, 0, ·-, 0) M(O, ty t», ---, tn): 


The result follows from Theorem 4.6.9. 


Corollary 1. .X and 5? are independent. з y 
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Corollary 2. (n — 1) 502 is уп — 1). 
Since 
a (Х.и) „а X-n) A 
Ete Panton, (E) fm 


and X and S? are independent, it follows from 


Wt Gon. (Ey een. 


д? 
that 


Elexp [! Ў Oel uy 5 uy } = Elexp [in( x А ) +@- D ) 


= Eexp (28) Eexp [е =) Sh 


* 
that is, 


Cn» r — ayy 2 5259] 
(1 — 20"? = (1 — 21)? E exp [e 03 i}, РЕҢ, 
and it follows that 


Ex [\ boty ^H a (n- A 

р | (n гё! =(1 2) t « 1. 

By the uniqueness of the mgf it follows that (n = 1) 520° is y (n — 1). 
Corollary 3. The distribution of ya (K — pS is tn. 1). 


Proof. Since A/m(X — и)о is (0, 1), and (n — 1) 5/02 ~ уп — 1) and 
since X and S^ are independent, - 


in јо WR = 
V[m-Dsg]mn-1 - S 


Corollary, 4. If Xy, Xo X, are iid (ду, 0%) tv's) Y, Yo, =, Yp are iid 
M (un, 0°) туз, and the two samples are independently taken, 05203)(505) 
is F(m — 1,n — 1). If, S partion в, = 0%, then Sis; is F(m — 1,n- 1). 


H) is п — 1). 


Corollary 5. Let Xy Xa, ++, х, and Y;, Үз, =, Yn» respectively, be inde- 
, pendent samples from E 25у and (и, 03), Then 


Хг - 13 =m К 
{(т— 1)5 41] m PR =з ор? ae т ES қт +n — 2). 
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In particular, if с, = gz, then. 


X-Y-(mn-u) тп(т+п— 2у _ бич 
Vim- 15 + (л — SE] mn Алы M 


Corollary 5 follows since 


2 " 
X - Y 2 (un mA +2) 
(n – Ds? + - DS: 
а? 


апа 


2 „Хт + п 2) 
1 а; 
and the two statistics are independent. 


Remark 1. Тһе converse of Corollary 1 also holds See Theorem 5.3.31. 


Remark 2. In sampling from a symmetric distribution, Xyand s are 
uncorrelated. See орев 48.4. ` А 


Remark 3. Alternatively, Corollary 1 could have been derived from Corol- 
lary 2 to Theorem 5.4.6 by using the Helmert orthogonal matrix: 


Пут Пуп Пут m EV 
-1у2 1/72 : La 0 
ИМ 6б. 14/6; 106 0 

Аі ae А : 0 
ox 


kes Пути) —1//п(л—1). —1/у/л(л D e. (n— Dj nn - 1) 


For the case of n = 3 this was done in Example 4.4.8. In Problem 7 the 
reader is asked to work out the details in the general case; 


Remark 4. Ап analytic approach to the development of the distribution of 


Ж апа S^-is'as follows. Assuming without loss of generality that X, is 
жо, » we have as Jom pdf. oe d HOPE 


AES dp i i p х) } 


ааа 


Лх x, 7s iet 
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Changing the variables to yj yo *-", y, by using the transformation 
yy = (хь — %)/s, we see that _ 


л. n 2 
= =п-1. 
д y,7 0 апа P y, n 


It follows that two of the y,'s, say y,-, and y,, are functions of the: remain- 
ing y,. Thus either s | 


$ a Wo 0 youl а= В 
Jan = 5.5 я #7, 
ог 
ыга In = 46 
. where 
à 2 ve a P= o [2y 
= PE and 8 Vaa- n- 2 EX - (En). | 


We leave the reader to derive the joint pdf of (Y, Yo, ++; Үә, X, S?), 
using the result described in Remark 4.4.2, and to show that the rv's X, 5“, 
and (Y;, Y, ---, Y, ;) are independent. ` 


PROBLEMS 7.5 


1. Let X, X» =, X, be a random sample from (4, 0°), and ¥ and 5°, re- 
spectively, be the sample mean and the sample variance. Let Хк Ws 0), and | 
assume that X,, X» ---, Xm» X,., are independent. Find the sampling distribution 
of Xni — 30/5] V nn 1). 


2. Let X, Xo, Xn and Yin Yn +++ Y, be independent random samples from 
(дь g?) and (и, 0°), respectively. Also, let a, 3 be two fixed real numbers. If 
F.Y denote the corresponding sample means, what is the sampling distribution of 


a(X — m) + (Y — n) 


CAD arcade эс ҖЫЙ flm) t 
(m — 1) Sf + (п – 1) 52 Rabe 
mth- m п? 


where $ and $$, respectively, denote the sample variances of the X's and the Y's? 
3. Let Xi, Xp, г... X, be a random sample from (4.02), and K be a positive 
integer. Find E (52%), In particular, find E(S?) and var (52). : 

4. Arandom sample of 5 is taken from a normal population with mean 2.5 and 
variance o? = 36, 


(a) Find the probability that the sample variance lies between 30 and 44. 


(b) Findthe probability that the sample mean lies between 1.3 and 3.5, while the 
sample variance lies between 30 and 44.: 
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5. The mean life of a sample of 10 light bulbs was observed to be 1327 hours with 
a standard deviation of 425 hours. A second sample of 6 bulbs chosen from a 
different batch showed a mean life of 1215-hours with a standard deviation of 375 
hours. If the means of the two batches are assumed to be same, how probable is 
the observed difference between the two sample means? 

6. Let S and Sj be the sample variances from two independent samples of sizes 
п = 5 a п; = 4 from two populations having the same unknown variance g? 
Find (approximately) the probability that 57/82 < 1/5.2.0г. > 6.25. 

7... Let Xy, X5, +++, X, be a sample гот. 02); By using the Helmert orthogonal 
transformation defined in Remark 3, show that X and. 5? are independent. 


8. Derive the joint pdf of X and 5° by using the transformation described in 
Remark 4. 2 


7.6 SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 
` * 


Let (Xy, Yi) (Ж; Ро) (X, Yn) be a sample from а: bivariate normal 
population with parameters рл, 4, p, 02102. Let us write 
X= mit Xi Y-2n! үз Y, 
= (71) 3 Qc - Xy. bam D E OY- PF 
and à 


$7 (0-71 E QG - I) 


In this section we show that (X, үу independent of (Si, Siw s5 and 
obtain the distribution of the sample correlation coefficient ‘and regression 
coefficients (at least in the special case where p = 0). 


DA 1. The random vectors (X, Y) and (X, — X, Ха X, XQ — X, 
—Ё, Yz — Y, Y, — Y) are independent. The joint distribution of 
d. Y) is bivariate normal with parameters ш, Uo Os осүп, оўуп. 


Proof. The proof follows along the lines of the proof of Theorem 7:500. 
The mgf of (X; Y, Xi — Ж, X, — X, Yi — Y,—, Y, = Y)is given by 
M* = М(и, Vy tyy foy 57s lys Sty Sos 7 Su) 


—!Ёехр {uk +P + Ex -5* PET : 1) 
E exp {8 х(“ Tus i) + 8 5 ssp D 


ll 
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where ї = пт! 3; fi, $ = n7! Disi Therefore 
M*- fi £e (4 tit; = )x +(2 жат sr) 
fies (+= in (a i-i 


ы 


qo dmn + 20010 Lum 5-1] En) + s; — 5] 


+ [n + s; — ad 


E е : 2 d LA, 2 
шер ( yug ® + am * sa) 


exp {4 of EG 0 + one E — 06-9 


ees sa Lua 
y" +444 Es — sf} 
= M,(u, 90, 0, «++, 0) Mo(0, 0, tis 0, =y s Sis $2 7 Sn) 


ifor all real uU, Y, £j, 59 1s, Si, 7 5, Where My is the mgf of (X, Y) and M; is 
the mgf of (X, — X, «+, X, — X, Y, — Y, +--+, Y, — Y). Also, M, is the тре 
of a bivariate normal distribution. This completes the proof. 


Corollary. The sample mean vector (X, Y) is independent of the sample 


variance-covariance matrix ct SD in sampling froma bivariate normal 
u 


population. * 


Remark 1. The result of Theorem 1 can be generalized to the case of 
sampling from a k-vatiate normal population. We do not propose to do.so 
ШУТ к RM Ae i 


Remark 2. Unfortunately the method of proof of Theorem 1 does not lead 
to the distribution of the variance-coyariance martrix. The distribution of 
(Ж, Y, St, Sy, 52) was found by Fisher [31] and Romanovsky [102]. The 
‘general case is due to Wishart [142], who determined the distribution of the 
sample variance-covariance matrix in sampling from a k-dimensional normal 
distribution. The distribution is named after him. 

We will next,compute the distribution of the sample correlation coefficient: 


DRE QI у 


eu 
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and that of the sample regression coefficient: 
XQ; - X)(Y, - Y) 


(2) Вур at = Sy =R 52 
pobaj llena о Д) Si 1 


of Y on X. The distribution of the regression coefficient of X on Y is similarly 
determined. Since we will need only the distribution of R and By, whenever 
о = 0, we will make this simplfying assumption in what follows. The general 
case is computationally quite complicated. We refer the reader to Cramér 
[18] for details. 

We note that 


(3) » Ero Y(X, — X) 
R= DSS 
and 
(4) ; ' L Y(X, - X) 
Вих = "m-Ds 
Moreover, 
i BÈ 5? 
5 2 yum 
(5) R= VES 


In the following we write В = Вух. 


Theorem 2. Let (Xi, Ү,), «+; (Xm Yn), n > 2, bea sample from a bivariate 

normal population with parameters EX = ду, EY = pp, var (X) = o, 

var(Y) = o5, and cov (X, Y) = 0. In other words, let Xj, Xo, +++, X, be 

iid (д, 2) гуз, and Yp Yo, =, Y, be iid //(uo, 02) туз, and suppose 

that the X’s and Y's are independent. Then the pdf of R is given by "^ 
| D[(n - -1/2] 2 (0-42 ‚Л 

6... 0 - TONADA] OA IEEE In org 


ллы: | _ „Otherwise; 


fori the pdf of B is given 3 


9o  h(b- LE D ipa: -%0 <Б < mn, 


Proof: Without any loss of generality, we assume that os из = 0 and 
ai = о; = 1, for we can always define 
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X - gi and y- == 
21 92 
Now note that the- conditional {АИ of Y; given Xj, Xo, +++, X, s 


N(0, 1), and Yi, Ya, «+; Ym given Xy, Xp, :-:, Xm are mutually independent. 
Let us define the [olowing onapgonat transformation: 


(8) Xx = 


hub 


оу bri d w= Де» D de l2,29yn 
я whore (суў anzin isan enboyand matrix with the first two.roWS. 4» 5 
1 К 
(10) TI 3551/2. «ch 
ap e; ТАС AEN E ON e 


It follows from orthogonality that for any i > 2 


0) — ауто Fem VTE, acy = 0; 
and 

$ (E. Cij "i y 
CIRC RN LS d) 
“Moreover, Чаат neid А 
IE а uk ot wa vay” 
au} Ў T 1 п Si f Ё с 
(A + == Ботаў, 


vien, ^ is a value assumed by rv B. Also Л, Us, -, Ux; given > «Ё 
id. be. are normal rV's (being linear combinations of the Y's). Thus 


т j E{U,|X), X, s X,) = р cij E Y] Ху, Хо, Xn} 
; ; Ё 
Tr ? z0 


cov {U;, U,| Xs, X», ---, Xp} = cov {& ejt, az Y x Жуз. x) 


This last od follows since 
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zu n 


=ў E сусу cov (Y; Y| Xy Xo, s Х,) 
= E [2 


i iE - | 


бот (9j. YE. x; - Js [2 rug. bx 


From orthogonality, we have 
DV 0, і + К, 

(17) соу (0, шю, X» -. 2 ={ нее 

and it follows tae ‘he rv’s U,, Uz +++, Up given Xj, Xo, +++, Xm are 

mutually independent ./(0, 1), Now ? ч 


$- »- LU 
m IM 
\ = Ме Ш 
te = Bhi wl 
us ex 2 
Uu : U; 
(19) ^ Rene MUR: WEE in 0 ой» 


Lv v+ BY; 


Writing U = 02 and W = Di, UT, we see that the conditional distribu- 
tion of U, given Xy, X», +++, X, is Ds and that of W, given Xj, Xs, *:*, 
X,, is y(n — 2). Moreover uU and W aré independent. Since these condi- 
tional distributions do not involve the X's, we see that U and W are 
unconditionally independent with у 1) wee x - - 2) dispu respec- 
tively. The joint pdf of U and Wis =°" 
1, С 29 1 с обие D/2-1 ,-w/2 
э moi 79> | TOL bs 20-7272 . 
fune IU *. Tab aya} OF Т 
Let u +w = z; then и 5 г: and’ w= z(1 — r°). The Jacobian of this 
transformation is z, so;that the joint pdf of к? and Z is given by 
2 ma-s/h y en y 2у2-2, 
P.) туа (e E Ga Oh 
‚9л pna Est 


The marginal pdf of R” is ‘easily computed as as 


бу 70) = Nep od. Xs Y ЖА 


ло 
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Finally, using Theorem 2.5.4, we get the pdf of R as 


ae oou EA a ae ea e 
ЛО = TG y Pra - 22] 1 ТУ I<srsl. 
As for the distribution of B, note that. the ‘conditional pdf of 
U, = An - 1 BS, given Ху, Xy тт, X, ds WO, 1), so that the conditional 
Pdf of B, given W, X, 0, is N (OTAD; — Y). Let us write 
A-(n— 1)5& Then the pdf of гу A is that ofa хп — 1) rv. Thus the 
joint pdf of Band J is given by 


«QD Wb, А) = (ДУЗ), 
' where (la) is (0, 1/4), and АДА) is x — 1). We have 
hb) = f , A, 2 di à i 
SAN 1 = n/2-1 – % 
ТИСТ туг ~ Day f. PLE dì 
1 


iy KC + r d 
(22) TONO- DAE (Туку. 90 <р. 


To complete the proof let us write 
X;= m+ X*a, and у= ш + Ү*о;, 
Where Х* ~ (0, 1) and Yt ~ W(0, 1). Then-X, N (un 0), 
Y; ~ Wu, 92), and d 
ioi ЖӨ = ХӨ (уж — узу 
К = + E x 
of E ort - y Eas yY 
n nm =й o А NI 
QS) cb) тй ow д | 
$0 that pdf of R is the same as derived aboye. Also 


p. tn EO - ky (ys - ye) 


(24) ч ti hoe BY ( i a 
where the pdf of g* js given by (22). Relations (22) and (24) are used to 
find the pdf of 5, We leave the Teader to carry out these simple details, 


Remark 3. It is remarkable that for fixed n the sampling distributi 
ae were g distribution of R 
is independent of д, sp, gi cr In the general case where p'Ż 0, one can 
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show that for fixed п the, distribution of К depends only on p. See, for ў 
example, Cramér [18], page 398. 


Remark 4. Let us change the variable to 

R Lath 
25 C= > —2. 
se к У" 


Then 


and the pdf of T is given by 
1 


1 
(26) №) = 74—3 BO DLE] [8-2] 


which is the pdf of a t-statistic with n — 2 d.f. Thus Т defined by (25) has 
a f(n— 2) distribution, provided that р = 0. This result facilitates. the 
computation of probabilities under the pdf of R when p = 0. 


Remark 5. To compute the pdf of By, = R(S,/S2), the sample regression 
coefficient of Y оп Y, all we need to do is to interchange c, and gg in (7). 


Remark б. In the line of regression of Y on X, namely, 
y- Ys X) 
1 
we have already. computed the pdf of the slope. We leave the reader'to 
compute the distribution of the intercept is 
Т A= Арх = Y- ВХ. 


This is easily done if we note that. (X, Y)has a bivariate normal distribu- 

tion, B has the distribution given by (7), and (X, Y) and B are independent 

(by Theorem.1). 

Remark 7. From (7) we can compute the mean and variance of B. For 

n > 2, clearly : ; 
EB = 0, 

and for п > 3, we can show that — 


sr. 
t ЕВ = var (В)= 2 ul 
1 
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Moreover, t - А 
ЕА = EY — EB EX = y, 


OC— Ge 773 


and 


n2, 90 ш! 
n= Bg н 37 


Similarly, we can use (6) to compute the mean and variance of R. We 
have, for n > 4, 


ER=0 
and 


ER? = var (R) = тт 


PROBLEMS . 7.6 


"T. Let Q5, Y/QG Y), (X,, Y,) be a random sample from a bivariate normal 
population with EX = д, EY = ps, Var(X) = var (Y) = 02, and cov (X, Y) = po? 
Let X, Y denote the corresponding sample means, 518, Sz, the corresponding 
sample variances, and 51, the sample covariance. Write R = 25,,/(S,? + 5,2). 
Find the pdf of R` к 
[Нйи: Let U = (X + Y)2, and V= (X — Y)/2, and observe that the random 
vector (U, V) is also bivariate normal. In fact, U and V are independent. ] 


2. Compute the sampling distribution of the intercept 
A= Any = FB? 
assuming that EX, = 0, / 
3. Let X and Y be independent normal rv's. A sample of л = 11 observations on 


(X, Y) produces sample correlation coefficient r = .40, Find the probability of 
obtaining a value of R-that exceeds the observed value. 


4. Let X, X, be jeintly normally distributed with Zero means, unit variances, and 
correlation coefficient 0. Let $ bea x*() rv that is independent of (X3, X). Then 
the joint distribution of Y, = x,/ S/n and Y, = X,/ V/S/n is known as а central 
bivariate t-distribution. Find the joint pdf of ( Y, Yj) and the marginal pdf's of Ү, 
and Y,, respectively, nen 


5. Let(X, Y. y. =, (X,, Y,) bea sample from a bivariate normal distribution with 
parameters EX; = ш, EY, = Ha Var (X;) = var (Ү,) = g?, and cov (Xa Y;) = po, 
i= 1, 2, :-:, n. Find the distribution of the Statistic * 
TX,Y)- уп) (Рд) _ - 
4 Ea - Y,- + Ӯ) 


CHAPTER8 


The Theory of Point Estimation 


8. INTRODUCTION 


In this chapter we study the theory of point estimation. Suppose, for example, 
that a random variable Y is known to have a normal distribution (и, a^), 
but we do not know one of the parameters, say д. Suppose further that a 
sample Ху, Xz, +, X, is taken on X. The problem of point estimation is to 
pick a (one-dimensional) statistic T(Xy, X», ++, Xn) that best estimates the 
parameter д. The numerical value of T when the realization is Xj, Xo, т, X, 
is frequently called ‘an estimate of р, while the statistic T is called an 
estimator of u. However, we will not observe this distinction and will use 
the word "estimate" for both the function T and its numerical value. If 
. both д and c^ are unknown, we seek a joint statistic T = (0, V) as an 
estimate of (u, 0”). 

In Sections 2 and 3 we do the preliminary work of formally describing the 
problem and investigate the desirable properties of an estimate that we may 
seek. Sections 4 and 5 deal with the theory of unbiased estimation. In Sections 
6, 7, and 8 we discuss some commonly used methods of estimation, while 
in Section 9 we consider in detail the construction of a minimal sufficient 
statistic. The material of Section 9 properly belongs in Section 3, but we do 
not need it in Sections 4 through 8. Moreover, it is an important concept 
that merits special attention—hence the separation. 


8.2 THE PROBLEM OF POINT ESTIMATION 


Let X be an rv defined on a probability space (0, ^, P). Suppose that the 
df F of X depends on a certain number of parameters, and suppose further 
that the functional form of F is known except perhaps fora finite number of 
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these parameters. Let @ be the vector of (unknown) parameters associated 
with F. 


Definition 1. The set of all admissible values of the parameters of a df F 
is called the parameter space. 


We will write F, for the df of X if @ is the vector of parameters associated 
with the df of X. Let 0 take values in the parameter set Ө. Then the set 
(Fo: 00) is called the family of df's of X. Similarly we speak of the family 
of pdf's of X if X is continuous, and the family of pmf's if X is discrete. 


Example 1. Let X ~ b(n, p), and p be unknown. Then Ө = {p:0<p< 1} 
and (b(n, p): 0 < p < 1} is the family of possible pmf's of X. 


Example 2. Let Х ~ N (hs a’). If both џ and. c^ are unknown, 
Ө = (m0): = ao < u< ©, o >0}. If p= Ho» Say, and g^ is. пої 
known, Ө = (а, 0°): o°>0} or, simply, Ө = (s > 0} = (0, со). 


Let X be an ry with df Fp, where 0 = (01,02, -*,,0,). is the vector of 
© unknown parameters. Suppose that 0 € Ө. Let Xy Xo, its X, be a random 
sample of п iid observations on X. In this chapter we investigate the problem 
of approximating 0 on the basis of the available sample. In the following 
we restrict ourselves to the case where Ө © 2 or to the case where we have 
to approximate some function d of Ó into 2. 


Definition 2. Let Xy, Xp, =+, X, be a sample from Fy, where 060 © AA 
statistic T (X, X», -.., X.) is said to be a (point) estimate of 0 if T. maps 2, 
into Ө. 


The problem of parametric point estimation is to find an estimate. T, for 
the unknown parmeter 0, that has some nice properties. We remark that we 
have already encountered a problem of point estimation in Section 4.8, 


Example3. Let Х|, X, -.., X, be,a sample from P(A), where А is not 
known. Then X = п-! Xk- X, is an estimate of 2, and so also is [2f (n +1) 
1 IX; Indeed, any X; is an estimate of A, я P cf 


Example 4. Let Х|, S Ass X,.be.a sample. from b(1; p), where p is 
unknown. Then X = п! Dy X; is an estimate of, p. Some other estimates 
aT = $, T = Xand T = (X, + X,)/2. 
LAE 


From these examples it is clear that we need some criterion to choose 
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among possible estimates. In the following sections we will consider. some 
properties that we may require our estimates to possess and discuss some 
commonly used: methods of estimation. We conclude this section by empha- 
sizing that we demand in Definition 2 that Т be a statistic (independent of 
any unknown parameters). We also remark that we do not consider here the 
muchi harder problem of (nonparametric) estimation when not even ‘the 
functional form of F is known. (See Chapter 13.) 


8.3 PROPERTIES OF ESTIMATES 


We have seen that;many estimates. will be available to usin any given situa- 
tion. It is therefore desirable to investigate some properties of point estimates. 
This will help us to decide which estimate to choose. i; We first consider the 
concept of consistency. 


Definition 1. Let XY, Xj-- be a sequence of iid rv's with common df 
Еу,@ € Ө. A sequence of point estimates T,(Xi; X305: X,) = T, wil ‘be 
called consistent for 0 if 


T,—20 as.n o0 
for each fixed 0 € 0. 


Remark 1. Recall that T, 7:0 if and only if P{|T,.— 0|» є} — 0 asn > co 
for every є > 0. One can similarly define strong consistency of a sequence of 
estimates T, if Т, *5 0. Sometimes one speaks of consistency in the rth mean 
when T, -^ 0. In what follows, "consistency"" will mean. weak consistency 
of T, for 0, that is, T, 5... 


Remark 2,. It is important to remember that consistency. is essentially a 
large-sample property. Moreover, we speak of consistency of a sequence; of 
estimates rather than one point estimate. | 
Example 1... Let Xj, Xz -- be iid b(1, p).tv's. Then EX; = p;and it follows 
by the WLLN. that ' г 


LÀ. 
1 — B 
= р ? pus 
Thus X is consistent for p. Also (2 X:+ 1)/(n + 2) = p. so that a consi- 
stent estimate need not be unique. Indeed, if T, Zp, and c, — 0 as n >00, 
then T, + €, Ё, р. 


3 
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Theorem 1. If X, Х„.--аге iid rv's with common Jaw. (X), and 
E|X|? < оо for some positive integer p, then А 

phyin 

а Мог к р, 


and яо 371-X ^is consistent for EX*; 1 « ес p. Moreover: if c, is any 
sequence of constants such that c, 3 Q0 as n — со, then {лг} DX! с„} ds 
also consistent for ЕХ*, 1 < k <p. 


This is simply a restatement of the WLLN for iid rv’s. 


Example 2. «Let Xj, X», --- be iid Wu, c?) rv's. If S? is'the Sampii variance, 
we know that (m —1) 182/02 — 3* (m = 1). Thus’ (5/0) = 1 and 
var (87/5?) =!2/(n — 1). Tt follows that и 
2 4 
Pest ott > ey mg no as ancl 
i {| e») 2 (n — Dé 
Thus S? £, g?, Actually, this result-holds for any sequence of iid rv's with 
E|x|* « co and can be obtained from Theorem 1. See Example 6.3.4. 


as n — oo. 


Example 2 is a particular case of the following theorem. 


Theorem 2. If T, is a sequence of estimates such that ET, 0 and 
var(T,)- 0asn > co,then T, is consistent for 0. 


Proof. We have., 


OTe o> eps E EDI PET SR ET; oy 
= є? {var (Т„) + (ET, — b} = 0 as n — oo.' 


We next consider the concept of invariance. Consider, for example, an 
experiment in which the length of life of a piece of equipment is measüred. 
Then an estimate obtained from the measurements ‘expressed’ іп hours and 
minutes must agree with an estimate obtained from the measurements 
expressed in minutes.’ Similarly, if the relative speed of two cars is to be 
estimated, the estimates based on speed measured in miles per hour and in 
kilometers per hour must correspond. 

We now formalize these notions. Let ¢ be a group of Borel-measurable 
_ functions of Ф, onto itself. The group operation is composition, that is, if 
n and g, are mappings from 2, onto itself, g;g, is defined by gogi(X) 
= gg(X)). Also, 4 is closed under composition and inverse, so that all 
maps in $ are one-to-one and onto. ^" Arm А 
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Definition 2. A family of probability distributions {Р,: 0 € Ө} is'said to be 
invariant under a group if, for each ge 4 and every 0€0, we can find a ` 
unique 0^ € Ө such that the distribution of g(X) (which is an rv) is given by 
Py, whenever X has the distribution Pa- We write 0^ = 20 


Remark 3. Here X = (Xy, Xo, «++, Xn) is an n-dimensional rv so that Z, is 
the space of values of X. P, is the distribution of X which determines the 
df of g(X). (See Section 4.4.) 


Remark 4. The condition that (Pj: 0 € Ө} be invariant under 4 is the same 
as saying that 


(1) Py(g(X)e A} = Pp (Xe А} 
for all Borel subsets in 2. Equivalently, 
(2) Еў(х\, X» 7 Xn) = Ех, Xo Ха)» 


where F* is the df of g(X;, Xo, +, Xn) and F is the df of (Xis, Xo, +++, X). 


Example 3. Let ¥~b(n,p),0<ps 1. Let 4 be the group consisting of 
the identity map e and mappings g, where g(x) = п = x. Then g(X) is 
b(n, 1 — р), and we see that gp = 1 — p. The group 4 leaves the family 
(b(n, р):0 x p < 1) invariant. 


Example 4. Let Xj, X», --, X, bea sample from Ju, c^). The joint pdf 
of (Xy, Xz ^, Xn) is given by 


1 Ti н 2 
fx Xo ә Xn) = oy exp { = we (х‹— д) }. 
Consider the group of transformations €, which contains elements g: 
а(х, Xa) = (ах + b, sax, +b), а> 0, —со « < co. 
The pdf of g(X) is given by : 
1 let Ae 
* К =e pay tes > ‚— ац bl 
f*G xo о = ууру o Ip До в 9). 


We see that 
Elu, о?) = (аи +b, dg?) ^ —co «au € b « co, ад? > 0. 
Thus 4 leaves the family of pdf's (f: — оо < p< + о, c? > 0) invariant. 


Definition 3. Let ¢ be a group of transformations that leaves the family 
{Е,: бє Ө} of df's invariant. An estimate T is said to be invariant under ¢ if 
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(3) T(g(%), #00), =, 8(X,)) = TOG, Xs. +, X) 
for all ge 9. 


The principle of invariance says that we should restrict attention to 
invariant estimates. A comprehensive investigation of the invariance prin- 
ciple and invariant estimates is beyond the scope of this book. We will con- 
sider here only some commonly used invariant estimates. 

Definition 4. An estimate is said to be location invariant if 

(4)  T(X«-aX;*a-,X,*a)- T(Xy Xo s X), ae; 
scale invariant if 


(5) T(cXy, СХ, ++, cX,) = T(Xy Xos ++, Xn)» c#0, ced 
location and scale invariant if 
(6) T(cX, + a, cX; + a, ++, eX, + a) = Т(Х\, Xo, «++, Х,), 


c#0, а, сє?; 
and permutation invariant if 
(Ne T(Xi Xi ++, Xin) = T(G X» 7, X,) 
for all permutations (i, io, ---, i,) of 1, 2, «++, n. 


Remark 5. In Definition 4 it is assumed that the family of distributions is 
left invariant by the group under consideration in the sense of Definition 2. 


Example 5. Let Xj Ж, =, X, be iid W(u, а?) rv's. Then the family 
LA (u, 0°): — оо < u < co, o” > 0) is invariant under translations, The esti- 
mate S" for 0° is invariant under this group since 


T(E), £2), =, #05) = TK + a, +, X, + а) 
1 n Es 
=y rh ta аў = 5 
= T(X, X, ---, X). 
But 52 is not invariant under scale changes since 


1 2 y 
> Fs 0X, s eX) = LÀ AA = Xy = es. 


Clearly the family of joint densities of (X, X», --:, Х„) is also invariant under 
permutations. The estimate X is a permutation-invariant estimate of и and 
S° is a permutation-invariant estimate of g. 


|... We next consider the important concept of a sufficient statistic. After the 
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completion of any experiment, the job of a statistician is to interpret the data 
he has collected and to draw some Statistically valid conclusions about the 
population under investigation. The raw data by themselves, besides being 
costly to store, are not suitable for this purpose. Therefore the statistician 
would like to condense the data by computing some statistics from them 
and to base his analysis on these statistics, provided that there is no loss of 
information in doing so. In many problems of statistical inference a function 
of the observations contains as much information about the unknown 
parameter as do all the observed values. The foilowing example illustrates 
this point. 


Example 6. Let Xj, Xj =, X, be a sample from (yu, 1), where р is 
unknown. Suppose that we transform variables Xj, X», ---, X, to Yi, Yo, =, 
Y, with the help of an orthogonal transformation so that Y; is J/(4/n pi, 1), 
Ys, +, Y, are iid (0, 1), and Y;, Yo, --, Y, are independent. (Take y; = 
a/n x, and, for k = 2, +++, n, y, = [(k — 1)x, — Qa + хь 1) УКТ.) 
To estimate и we can use either the observed values of Xj, X, ---, X, or 
simply the observed value of Y, = ул X. The rv’s Yo, Ys, =, Y, provide 
no information about и. Clearly, Y, is preferable since one need not keep a 
record of all the observations; it suffices to cumulate the observations and 
compute y,. Any analysis of the data based on y; is just as effective as any 
analysis that could be based on x;'s. 


A rigorous definition of the concept involved in the above discussion 
requires the notion of a conditional distribution and is beyond the scope of 
this book. In view of the discussion of conditional probability distributions 
in Section 4.2, the following definition will suffice for our purposes. 


Definition 5. Let X = (Xi, X», ---, Х,) be a sample from {Р}: 0 e0}. А 
‘statistic T — T(X) is sufficient for 0 or for the family of distributions 
(Fs: 0 € O} if and only if the conditional distribution of X, given T= t, does 
not depend on 6 (except perhaps for a null set A, Р {Тє A} = 0 for all 0). 


Remark 6. The outcome Xj, Xz, ++, X, is always sufficient, but we will 

exclude this trivial statistic from consideration. According to Definition 5, if 

T is sufficient for б, we need only concentrate on T since it exhausts all the 

information that the sample has about 6. In practice, there will be several 

sufficient statistics for a family of distributions, and the question arises as to 

which of these should be used in a given problem. Fortunately we do not 

need to make this choice in the problems of elementary statistical inference 

considered in this book. We will return to this topic in more detail in. 
Section 9. 
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Example 7. We show that the statistic Y; in Example 6 is sufficient for д. 
By construction Y», --, Y, are iid (0, 1) rv’s that are independent of Yj. 
Hence the conditional distribution of Y, «+, Ym given Yı = a/n X, is the 
same as the unconditional distribution of (У, :--, Y,), which is multivariate 
normal with mean (0, 0, ---, 0) and dispersion matrix 1,-1. Since this distri- 
bution is independent of д, the conditional distribution of (Yi, Yo, =, Yn) 
and hence (Xi, X», =", Х„), given Y, = yy is also independent of д and Y; 
is sufficient. (See Problem 5.4.1.) 

Example 8. Let Xj, Xo, ---; X, be iid b(1, p) rv's. Intuitively, if a loaded 
coin is tossed with probability р of heads п times, it seems unnecessary to 
know which toss resulted in a head. To estimate p, it should be sufficient 
to know the number of heads in п trials. We show that this is consistent 
with our definition. Let T(Xy, Xo, Xn) = Df X; Then 


P(X, ВЮ = Xa X X;= ) = ас da =х„Т= 1) 
p (2)ra = py 
if A x; = t, and = 0 otherwise. Thus, for Dix = 1, we have 


b" s, ty: 
MID OREN deer ara 


which is independent of p. It is therefore sufficient to concentrate on in Xe 
Example 9. Let X;, X; be iid P(A) rv's. Then X, + X, is sufficient for A, for 


Р{Х, = х Xp = t х1) 
ааа) = РХ + X;- t) 


if t = x, + x2, x; = 0, 1, 2, = 
0 otherwise. 


Thus, for x; = 0, 1, 2, «+, = 1, 2, xy + x; = t, we have 

Р(х = хь = xd X X 0 =(f)(4), 
which is independent of 2. 
_ Not every statistic is sufficient. 


Example 10. Let X, X, be iid P(A) гуз, and consider the statistic 
T = X, + 2X2. We have 
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P{X, = 0, X;—1) 
P(X, + 2X2 = 2} 
Z een) 
P(X =0, X71) +P{M=2, X0) 
E Tue 


P(X, 20, X; = 1X 23572) = 


“e+ ape FAD 


and we see that X, + 2X; is not sufficient for A. 


Definition 5 is not a constructive definition since it requires that we first 
guess a statistic T and then check to see whether T is sufficient. Moreover, 
the procedure for checking that T is sufficient is quite time-consuming. We 
now give a criterion for determining sufficient statistics. 


Theorem 3 (The Factorization Criterion). Let X, X» >» X, be discrete 
гү? with pmf рх, Xo» 7s Ха) 8€0. Then T(X,, X», °°, Xn) is sufficient 
for 0 if and only if we can write 

(8) Pol Xs Xo 7t Xn) = НО, хь 718 Xn) gT X2 "5 Xn))s 

where h is a nonnegative function of ху, Xo, °**) Xs only and does not depend 
on 6, and g is a nonnegative function of 0 and T(x, Xo *:, x4) only. The 
statistic T(X,, --, X,) and parameter 0 тау be vectors. 


Proof. Let T be sufficient for 0. Then P{X = x|T = 1) is independent of 0, 
and we may write 
PX = x) = РХ = x, TG, Xo s X) = 0 
= PT = t} P(X 9 x|T = 0), 

provided that P(X = x|T = 1} is well defined. : 

For values of x for which РАХ = х) = 0 for all 0, let us define 
ху» X25 1072 Xn) = Os and for x for which P,(X = х}`> 0 for some 0, we define 

W(x, X» 7 x)= Р(Х, = xy А, = x,|T ae ї} 
and define 
(Тох X9) = Po{ T(x, s Xn) =). 


Thus we see that (8) holds. 
Conversely, suppose that (8) holds. Then for fixed ty we have 
T = to} = X= 
i Pi o) ы-у 5 


X. gT) Их) 


ii (x:T(x)-t9) 


= gdi) n h(x). 
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Suppose that P,{T = to} > 0 for some 0 > 0. Then 
i£ 0 if T(x) f, 
PjX-x|T-1) c шз 4 -| P,{X=x} 
Dew S PATX)=t} if T&)=to. 
Thus, if T(x) = to, then 


Р(Х =x} _ а) №) 
РТО - st) PE 


which is independent of 0, as asserted. This completes the proof. 


Remark 7. Theorem 3 holds also for the continuous case and, indeed, for 
quite arbitrary families of distributions. The general proof is beyond the 
scope of this book, and we refer the reader to Halmos and Savage [44] or 
to Lehmann [70], pages 47-49. We will assume that the result holds for 
the absolutely continuous case. We leave the reader to write the analogue 
of (8) and to Prove it, at least under the regularity conditions assumed їп 
Theorem 4.4.6. 


Remark 8. Theorem 3 (and its analogue for the continuous case) holds if 
9 is а vector of parameters and T is a random vector, and we say that T 
is jointly sufficient for 0. We emphasize that, even if 0 is scalar, T may be 
a vector (Example 14). If 0 and T are vectors of the same dimension, and 
if T is sufficient for 0, it does not follow that the jth component of T is 
sufficient for the jth component of 0 (Example 13). The converse is true 
under mild conditions (see Fraser [33], 21). 


Remark 9. If T is sufficient for 0; any one-to-one function of T' is also 
sufficient. This follows from Theorem 3 since, if U — k(T) is a one-to-one 
function of T, then / = k-\(u); and we can write 


LAX) = git) Mx) = вки) h(x) = (и) h(x). 
If Т, T, are two distinct sufficient Statistics, then 
SAX) = gati) h(x) = t2) h(x), 
and it follows that T, is a function of Tz. It does not follow, however, 
that every function of a sufficient Statistic is itself sufficient. For example, in 
sampling from a normal population, X is sufficient for the mean а but X? 
is not sufficient for д. Note that X is sufficient for p’. 


Remark 10. As а rule, Theorem 3 cannot be used to show that a given 
statistic T is not sufficient. To do this, one would normally have to use the 
definition of sufficiency. In most cases Theorem 3 will lead to a sufficient 
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statistic if it exists. However, it does not answer the question of whether a 
family of densities (pmf’s) admits a sufficient statistic (other than the trivial 
sufficient statistic). pa 


Remark 11. If T(X) is sufficient for {Fy e9}, then T is sufficient, for 
(Fy: 0 є w}, where w © Ө. This follows trivially from the definition. 


Example 11. ` Let Xy, Xo, =, X, be iid (1, p) гуз. Then T = у Aris 
sufficient. We have 


ги, п-ЕЛх, 
PX, = x» Xp = x» o5 X,-xp-p! A c py com 
and, taking 
h(xy, х2 77 Xn) = 1 and gn Xo vse — py (25) 
we see that T is sufficient. 
Example 12. Let Xj, X, s Xn be iid rv's with common pmf 
PUK; = к} = 1, heb 2 n М; 1=1,2 п. 


Then 


1 


Ру{Х = К, X2 = kay s Xn = kn) = уук if 1 < ky Ку, 


1 


1 i 

Utt etico 
where g(a, b) - 1 if b >a, and — 0 if b <a, It follows, by taking 
gw[max (kp 5k] = (1/N") e(maxis;s, ki N) and h= gl, min КД), 
that max (Xy, X», «+, Xn) is sufficient for the family of joint pmf's Py. 


Example 13. Let Xj; X», ‘+, X, be a sample from N (u, o^), where both y 
and o? are unknown. The pdf of (Xi, X», +, Xn) is 


_ 2G = ah 


1 
Su) = узу P ag 


"СУЗЛЕ agito Moin 
It follows that the statistic 


Т(Хь X.) = (E p bi 3 
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is jointly sufficient for the parameter (и, 02). An equivalent sufficient 
statistic that is frequently used is. T((X;, ---, X,) = (X, 5°). Note that X is 
not sufficient for 1; if 0° is unknown, and. 5° is not sufficient for o^ if д is 

` unknown. If, however, c^ is, known, X is sufficient for и, If и = до is 
known, X? (X, — m) is sufficient for 0”. 


Example 14. Let X;, Xz; ^, X, be a sample from pdf 


fe) = H xe[- 2. 4]. 00, 


0, otherwise. 
The joint pdf of Xi, Xz, ---, X, is given by 
Лх» хә i x,) = qx ties oy Xn) 
where 


Ax G Xo cy Xp): — + < min x; < max x; < 5}: 


It follows that (тїп, Ху, max,<;<, X;) is sufficient for 0. 


The concept of sufficiency is frequently used with another concept, called 
completeness, which we now define. 


Definition 6. Let (fix); Ө € Ө} be a family of pdf's (or pmf's). We say that 
this family is complete if 


Eg(X)—0 forall @e@ 


implies 
Р(Х) = 0} —1 - forall бєӨ: 


Definition 7. A statistic T(X) is said to be complete if the family of distri- 
butions of T is complete. 


In Definition 7 X will usually Ье а random vector. The family of distribu- 
tions of T is obtained from the family of distributions of Xy Xo, ++, X, by 
the usual transformation technique discussed in Section 4.4. - 


Example 15. Let Xi, X», +, X, be iid ҢІ, p) rv's. Then T = Xi X isa 
sufficient statistic. We show that T is also complete; that is, the family of 
distributions of T, {b(n, p), 0 < p; < I}, is complete. 
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ET)-Eso()r-rr-0 forall pC, 1) 
may be rewritten as 


t 
(l= py É Leo (62-5) — 0... forallipe(0, 1). 
This is a polynomial in р/(1 — р). Hence the coefficients must vanish, and 
it follows that g(t) = 0 for t = 0, 1, 2, -+ n, as required. 

Example 16. Let X be .#(0, 0). Then the family of pdf's (//(0, 0), 0 > 0} 
is not вошріе(е since EX = 0 and g(x) = x is not „identically zero. Note 
that T(X) = X? is complete, for the pdf of ХАА 0x* (1) is given by 


en t/t ' b 
fos imposto 
0, otherwise. 


-1/2 e 

Exg(T) = у= 31, g(t) t dt=0  forall@>0, 
which holds if and only if f? g()n"? e^" dt = 0, and using the unique- 
ness property of Laplace transforms (Theorem P. 2.27), it follows that 

gr? =0 — fort» 0, 
that is, g(t) = 0. 
Example 17. Let Ху, X5--, X, be a sample from (0, 05). Then 
T= (OT X» Bi xXx 25 is sufficient for 0. However, T is not complete since 
я 2 n \ 
D £x) -e-0ZX)-0 forall, 
l 
and the function g(x), «+; х„) = 2 (Z1 x)? — (n + 1) DY x; is not identi- 
cally zero. 
Example 18. Let X ~ U(0, 0), бє (0, со). We show that the family of pdf's 
of X is complete.We need to show that 
Eye(X) = £3 dg) dx = 0» for alld > 0 
0 


if and only if g(x).— 0 for all x. In general, this result follows from Lebesgue 
integration theory. If g is continuous, we differentiate both sides in 


[ео а-о 
to get g(0) = 0 for all 0 > 0. 
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Now let Xj, X;, +, X, be iid U(0, б) туз. Also, let М, = max (Xy, X», 
X,). Then the pdf of M, is given by 


fEl = 
We see by a similar argument that M, is complete, which is the same as 


saying that (/,(x|0); 0 > 0) is a complete family of densities. Clearly M, is 
sufficient. 


(A ed Os Sexe Os 


otherwise. 


Example 19. Let Y;, X», ---, X, be a sample from pmf 


1 
x 2495.5 
Py) [Ns «b ий ы 
0, otherwise. 


9 
We first show that the family of pmf’s {Py, N = 1} is complete. We have 
Еу) = + Ж gk) =0, forall N> 1, 
k=l 


and this happens if and only if g(k) = 0, k = 1, 2, --, N. Next we consider 
the family of pmf's of M, = max (X; :--, X,). The pmf of M, is given by 


POG) = жу e аро, М. 


Also, 
Eyg(M,) = L smi t] 0  forallNzl. 


^ CEg(M,) = 61) = 0 
implies g(1) — 0. Again, 


Exg(M,) = S9. + 80)(1- 4) =0 


so that g(2) = 0. 

Using an induction argument, we conclude that g(1) = g(2) = 
g(N) = 0 and hence g(x) = 0. It follows that P% is a complete family of 
distributions, and M, is a complete sufficient statistic. 

Now suppose that we exclude the value N = ng for some fixed m > 1 
from the family (Py: N > 1}. Let us write 2 = (Py: N > 1, N по). Then 
# is not complete. We ask the reader to show that the class of all functions 
g such that E;g(X) = 0 for all Pe 2 consists of functions of the form 

O, ka 1,215.1 + 1, my + 2, m +3, =, 
j TE в UE my i 

w. =0, k= п +1, 


where c is a constant, c # 0. 


PROPERTIES OF ESTIMATES 347 


Remark 12. Completeness is a property of a family of distributions. In 
Remark 11 we saw that if a statistic is sufficient for a class of distributions 
it is sufficient for any subclass of those distributions. Completeness works in 
the opposite direction. Example 19 shows that the exclusion of even one 
member from the family (Py: N > 1} destroys completeness. If a family 
2 is complete, it is sometimes possible to conclude completeness for a larger 
class, but we will not go into details here. See Fraser [33], page 25. 


The following result covers a large class of probability distributions for 
which a complete sufficient statistic exists. 


Theorem 4. Let {/›: 0 € Ө} be a k-parameter exponential family given by 
k 
(9) fo(x) = exp (à 0,09) T,(x) + D(8) + St). 


where 0 = (01, 0», --:, 0,) E Ө, an interval in Z,, Tis T» 55 T, and S are 
defined on #,, T = (Tj, Т, +, Tj, and x = (xy Xo c ай k < n. Let 
Q = (Qi, Qz, ---, Q,), and suppose that the range of Q contains an open 
set in #,. Then 


T = (7,(X), TAX), «+, T(X)) 
is a complete sufficient statistic. 


Proof. For a complete proof in a general setting we refer the reader to 
Lehmann [70], pages 132-133. Essentially, the unicity of the Laplace trans- 
form is used on the probability distribution induced by T. We will content 
ourselves here by proving the result for the К = 1 case when fọ is a pmf. . _ 
Let us write Q(0) = 0 in (9), and let (а, 8) © Ө. We wish to show that 


Eyg(T(X)) = zig) Py{ T(X) = t} 
(10) = D g(t) exp {01+ D(0) + $*()} —0'. forall 0 
t 
implies that g(t) = 0. (Note that we have used Theorem 5.5.1 in next to the 
last equality.) 

Let us write x* = x if x 20, —0if x «0, and x, = хх <0, =0 
if x > 0. Then g(t) = Н *(t) — & (t), and both g* and g are nonnegative 
functions. In terms of g* and g (10) is the same as 
(11) Ee (t) desta) _ zs (t) eftts* «D. 


for all 0. 
Let 0, € (a, B) be fixed, and write 


lot +S* co ‚(узб +5*ч› 
(12) р) = саде m. and  p()- па0уео 
Z #0) е re) ghost 
1 i 
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Then both p* and p> are pmf's, and it follows from (11) that 

(13) nér0-nébro 

for all à € (a — 9 B — 09). By the uniqueness of mgf's (13) implies that 


p'-p(  forallt 


and hence that g (t) = € (t) for all t, which is equivalent to: g(t) = 0 for 
all t. Since T is clearly sufficient (by the factorization criterion), it is proved 
that T is a complete sufficient statistic. 


Example 20. Let Xy X» 5 X, be iid (us a) rv's where both y and o° 
are unknown. We know that the family of distributions of X = (Xy s Xn) 
isa two-parameter exponential family with Т(Х\, у» Xn) = (2 Х» РР. 6 2 
From Theorem 4 it follows that T isa complete sufficient statistic. Examples 
15 and 16 fall in the domain of Theorem 4. . 


PROBLEMS 8.3 


1. Suppose that T, is a sequence of estimates for parameter 0 that satisfies the 
conditions of Theorem 2. Then Т, 2, 0, that is, T, is squared error consistent for 0. 

If T, is consistent for 0 and |T,— 0| € 4 «co for all 0 and all 
(х Xo “5 Xn) € Rm show that T, 2, 0. If, however, |Т, — 0| S An < со, then 
show that T, may not be squared error consistent for 0. 


2. Let Xy X» 5 Xn be a sample from U[0, 6], 0 € Ө = (0, co). Let M, = max 
(Xi, Х„ +++, Xn). Show that M, 2, 0, Write Y, = 2X. Is Y, consistent for 0? 


3. Let X, X, =, X, be iid rv's with EX; = и and ЕХ? < оо. Show that 
ТХ}, Xa, Xp) = Ana + 1)} Xa iX; is a consistent estimate for и. 


4, LetX, X,-- X, be a sample fom U[0, 0]. Show that T(X,, Хз. X,) = 
(Ili: X)" is a consistent estimate for 0 ei. E 


5. Find à sufficient statistic in each of the following cases based on a random 
sample of size п. 


(a) X ~ B (a, B) when (i) а is unknown, B known; (ii) B is unknown, a known, 
and (iii) а, В are both unknown. 

(b) X ~ С(о, B) when (i) а is unknown, В known, (ii) В is unknown, a known, 
and Gii) а, B are both unknown. 

(с) X ~ Py,w, (х), where 


1 
Рм,м.(х) Елам? х= М, +1, М, +2,.—, No 
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and №, №№, < №) are integers, when (1) N, is known, №, unknown; 
(ii) №, known, N, unknown, and (iii) N,, М, are both unknown. 
(d) X~ f(x), where 
e #0 <х < оо, 
4) = ч otherwise. 
(e) X~ Дх; и, о), where 
) › хинд malls 
Дх; д, о) = УЙУН ехр | 272 (log X — uy]. x>0. 
(f) X ~ f(x), where 
Sx) = P(X = x} =с(0)27*/*, x=0,041,+,0>0. 
(g) X^ P, (x), where 
Р(х) = (1 — р)р X= HO + Io O<p<l, 
when (i) p is known, @ unknown; Gi) р is unknown, 0 known, and (iii) p, 
@ are both unknown. 

6. Let X = (Xj, X» 5 X,) bea sample from (aa, o°), where a is a known 

real number. Show that the statistic T(X) = (Li X» Xi, Xf) is sufficient for 

с but that the family of distributions of T(X) is not complete. 

7. Let X, Xo, X, be a sample from (ji o°), Then X = (Xo Xn 6 Xp) is 

clearly sufficient for the family (и, 0), ре 2,0 > 0. Is the family of distribu- 

tions of X complete? 

8. Let X, Xo =» X, be a sample from U(0 - 1,0 + 0), 0 e 2. Show that 

the statistic T(X;, -s X4) = (min X; max X;) is sufficient for 0 but not complete. 

9. Let X be an rv with pmf : ї 

Р(Х = – 1) = 6 апі РХ = х) = (1-0)? 605 x = 0, L2, t 

Is {P,:0 є (0, I} a complete family? 

10. If T = g(U) and T is sufficient, then so also is U. 

п. LetO € 2. We say that a parameter 0 € Ө is a location parameter for the 

df F(x) of an rv X if Fx) = F(x — 0), where F is a df. We say that a positive 

parameter 0 € @ is a scale parameter for the df F,(x) of an rv X if F,(x) = F(xJ0), 
where F is a df. - 

(а) 0 іѕ а location parameter if and only if the distribution of X — 0 computed 
under 0, that is, P,(X — 0 < x), is independent of 0. Find a group of trans: 
formations that leaves a location parameter family of distributions invariant. 

(b) 6 is a scale parameter if and only if the distribution of X/Ó computed under 
6, that is, P, (X/0 < x); is independent of 0. Find a group of transformations 
that leaves a scale parameter family invariant. 

12. In Example 19 show that the class of all functions g for which E,g(X) = 0 

for all P'€ 2 consists of functions of the form * 
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о k-Lh2,-,n-lnm-t2»t3: 
(0 =1 о k= 
- с, К = п + 1, 
where c is a constant. 
13. For the class {F,, Fa} of two df's, where F, is (0,1) and Р, is (1,0), find 
a sufficient statistic. 


14. Consider the class of hypergeometric probability distributions (Pp: D = 0, 
1, 2, ---, №), where 


NyVS(DN(N = D ЖАЫ, 
РЫХ = x) =(") ps к x0, E min (5 DJ. 
Show that it is a complete class. If ? = (P5: D = 0, 1, 2, ·.., N, D # d, d integral 
0<d< N}, is P complete? 

15. Is the family of distributions of the order statistic in sampling from a Poisson 
distribution complete? 

16. Let (X,, X; =, X,) be a random vector of the discrete type. Is the statistic 
Т(Х\, +, Х„) = (Xi s Xn-1) sufficient? 


8.4 UNBIASED ESTIMATION 


In this section we describe yet another property of estimates and study some 
procedures for finding estimates possessing this property. 


Definition 1. Let {F}, 066), Ө © 2, be a nonempty set of probability 
distribution functions. A Borel-measurable function T of 2, — Ө is said to 
be an unbiased estimate of 0 if 


(1) E{T)=06 - forall 0e@. 


Any function d(@) for which there exists aT satisfying (1) is usually referred 
toas an estimable function. An estimate that is not unbiased is called biased, 
and the function b(0, T), defined by 
Q) b(0, T) = ExT — 0, 
is called the bias of Т... 


Remark 1. Definition 1, in particular, requires that T be integrable (summ- 


able), that is, ET exist for every 0 e Ө. This definition can easily be extended 
to the case where 0 is a vector of parameters and T is à random vector. 


Exaple 1. Let X;, X5, ---, X, be a random sample from some population 
with fihite mean. Then X is unbiased for the population mean. If the popu- 
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lation variance is finite, the sample variance S? is unbiased for the popula- 
tion variance. In general, if the kth population moment m, exists, the kth 
sample moment is unbiased for m,. ү м 

Note that S is not, in general, unbiased for о. If X, X» +, X, are iid 
(а, 0”) rV's we know that (n — 1) S/o” is y(n — 1). Therefore 


Е(5 /п= 1а) = (js А зат ратур х-Р/#-1 o-i qx 
M 1 
SO ork 
1 
FAS) = Vay [уу 


The bias of 5 іѕ given by 


we 455 е) 
If T is unbiased for б, g(T) is not, in general, an unbiased estimate of 


8(0) unless g is a linear function. 


Example 2. Unbiased estimates do not always exist. Consider an rv with 
pmf &(1, p). Suppose that we wish to estimate (р) = p^. Then, in order that 
T be unbiased for p^, we must have 


Р = ET = РТ(1)+(1—р)Т1(0, 0<р<1, 


that is, 
P = p{T(1) — T(0)} + TO) 
must hold for all p in the interval [0,1], which is impossible. (If a convergent 


power series vanishes in an open interval, each of the coefficients must be 
0. See theorem P. 2.10.) See also Problem 1. 


Example 3... Sometimes an unbiased estimate may be absurd. [ 
Let X be P(A), and 402) = e ?'. We show that T(X) = (= 2)* is unbiased 
for d(4). We have 


ЕД) eD (- 2-5 = еу) С< OX „леси дд), 
xt x=0 х! 


However, T(x) = (— 2y > 0 if x is even, and « 0 if x is odd, which is 
absurd since d(2) > 0. 

Example 4. Let Xj, Ж, «+, Y, bea sample from P(A). Then X is unbiased 
for 2 and so also is 5°, since both the mean and the variance are equal to 
А. Indeed, aX + (1 — a) S’, 0 < а < 1, is unbiased for д, 


Let 0 be estimable, and let Т be an unbiased estimate of б. Let T, be 
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‘another unbiased estimate of 0, different from 7. This means that there 
exists at least one # such that P,{T # T;} > 0. In this case there exist 
infinitely many unbiased estimates of 0 of the form aT + (1— a)T;, 0<a<1. 
It is therefore desirable to find a procedure to differentiate between these 

estimates. : - 


Definition 2. Let бє O and 0(0,) be the class of all unbiased estimates T : 
of бо such that Ej, T? < co. Then Ty € U(0,) is called a locally minimum 
variance unbiased estimate (LM VUE) at б, if 


(3) Е, (То — 00) < ЕТ — 69). 
holds for all T € U(0,). 


Definition 3. ` Let U be the set of all unbiased estimates T of f € Ө such that 
ЕТ? < co for all 0 € Ө. An estimate Ty € U is called a uniformly minimum 
variance unbiased estimate (UMVUE) of 0 if 


(4) E(T, — 0) < ЕДТ — 0) 
for all бє Ө and every Te U. 


Remark 2. Let aj, a», ++; a, be any set of real numbers with У)", a; = 1. 
Let Xi, X», --+, X, be independent rv's with.common mean: and variances 
gy, К = 1, 2, +, n. Then T = X? a; X; is an unbiased estimate of ш 
with variance 27" , аў; (see the corollary to Theorem 4.6.5). T is called 
a linear unbiased estimate of д. Linear unbiased estimates of д that have! 
minimum variance (among all linear unbiased estimates) are called best 
linear unbiased estimates (BLUE’s). In Theorem 4.6.6 we have shown 
that, if Х; are iid rv's with common variance o?, the BLUE of his X= m 

;-, X If X, are independent with common mean д but different 
variances оу, the BLUE of д is obtained if we choose а; proportional to 
1/%; then the minimum variance is H/n; where Н is the harmonic mean: 
of a4, +=, o. (See Example 4.6.4.) 


Remark 3. Sometimes the precision of an estimate T of parameter 0 is | 
measured by the so-called mean square error (MSE). We say that an | 
estimate T, is at least as good as any other estimate Т in the sense of the 
MSE if H 


(5) E(T,— 0f < EXT- 0 — forallüc8. ; 
In general, a particular estimate will be better than another for some values | 


of 6 and worse for others. Definitions 2 and 3 are special cases of this 
% concept if we restrict attention only to unbiased estimates. 


Í 
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The following result gives а necessary and sufficient condition’ for an 
unbiased estimate to be a UMVUE. 


y 
Theorem 1. Let U be the class of all unbiased estimates Т of a parameter 
0 € Ө with ET? < co for all 6, and suppose that U is nonempty. Let Uy 
be the set of all unbiased estimates v of 0, that is, 


Uy = {v: Ey = 0, EX < co for all 0 <6}. 
Then T, € U is a UMVUE if and only if 
(6) E(vTo — 0 forall 0 and all ve Uo. 


Proof. The conditions f the theorem guarantee the existence of Ey(vTp) 
for all 0 and ve Up. Su. pose that Ty € U is a UMVUE and E (То) #0 
for some 0) and some »є Uo. Then Ty + Ave U for all real А. If Env =0, 
then Е, (voTo) = 0 must hold since Pa {Yo = 0} = 1. Let E, v? > 0. Choose 
А = — Eo (Tovo)  Ej,v;. Then 


2i 
a БУТ, + do) = Б„Т— PUO < pg 
; [АЛ 
Since Ту + Ay») € U and T; € U, it follows from (7) that 
(8) Vato (To + Agro) < vary, (To), 


which is a contradiction. It follows that (6) holds. 
‘Conversely, let (6) hold for some T, € U, all 6 €0 and all ye Up, and let 
TEU. Then T; — T € Up, and for every 0 


ЕТТ; — T)) = 0. 
We have ' 
E,T, = EKTT)) < (E,T2) (Е,Т?у!? 


by the Cauchy-Schwarz inequality. If ET? = 0, then P(T) = 0) = 1 and 
there is nothing to prove. Otherwise 


(8,79) < ғ, 
or var; (To). < var, (T). Since Tis arbitrary, the proof is complete. 
Theorem 2. ; Let U be the nonempty class of unbiased estimates as defined 
in Theorem 1. Then there exists at most опе UMVUE for 0. 

Proof. If T and Tye U are both UMVUESs then T — Tọ € Up and 
E(T(T—T)) =0  foralóe6,  , 
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that is, ET? = ETT»), and it follows that 

cov (T, To) = var, (To) for all 6. 
Since T, and T are both UMVUE's var,(T) = уаг,(То), and it follows 
that the correlation coefficient between T and Ty is l. This implies that 


Рат + bTy = 0} —1 for some a; b and all б € Ө. Since T and Tp are 
both unbiased for 0, we must have РАТ = Tọ} = 1 for all 0. 


- Remark 4. Both Theorems f and 2 have analogues for МУЧЕ at 6, € 6, 
0, fixed. 


Theorem 3. If UMVUE's T; exist for real functions d;, i = 1, 2, of 0, they 
also exist for Ad; (A real), as well as for d, + 4, and are given by AT; and 
T, + Tz, respectively. 


Proof. The proof is left as an exercise. 


Theorem 4. Let {Т,) be a sequence of UMVUE's and T be a statistic with 
ET? < оо and such that Ej (T, — T} > 0 as со for all 0€ 6. Then T 
is also the UMVUE. 


Proof. That T is unbiased follows from |E,T — 0| < БТ — Ta < 
E}? (T, — Т}?. For all v € Uo, all 0, and every n = 1, 2, -- 


EKT,») = 0 


"s 


by Theorem 1. Therefore 
ЕДУТ) = E(vT) — ET.) 
= ET - T) 

and Ў 

|E«vT)| < (Еу) [ЕКТ — T} —^ 0. тазпә oo 
for all 0 and all v e U. Thus 

ЕТ) = 0 for all ve Up, all 0€ 6, 

and, by Theorem 1, T must be the UMVUE. 


Example 5. Let X ~ P(A). Then X is thé UMVUE of A. Surely X is unbiased. 
Let g be an unbiased estimate of 0. Then T(X) = X + g(X) is unbiased for 
6, But the family of pmf's ( P(A): A > 0} is complete. It follows that d 
Eg(X)-0  foralis0-g()-0 for x =0, 1,2, =. 
Hence X must be the UMVUE of 2. 


Wo 
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Example 6. Sometimes an estimate with larger variance may be preferable. 
Let X be a G(1, 1/8) rv. X is usually taken as a good model to describe 
the time to failure of a piece of equipment. Let Х|, Xj, ---, X, be a sample 
of n observations on X. Then X is unbiased for EX = 1/8 with variance | 
пв. (X is actually the UMVUE for 1/8. Now consider M, — 
тіп (Xi; X», +, X,). Then nM, is unbiased for 1/8 with variance 1/8^, and . 
it has a larger variance than X. Howevet, if the length of time is of impor- 
tance, nM, may be preferable to X, since to observe nM, one needs to wait 
only until the first piece of equipment fails, whereas to compute X one 
would have to wait until all the n observations X;, X», ---, X, are available. 


Theorem 5. If a sample consists of n independent observations 
Xi X» «++, X, from the same distribution, the UMVUE, if it exists, is'a 
symmetric function of the Xs. h 


Proof. The proof is left as an exercise. 


The converse of Theorem 5 is not true. If Xj, Xo, ..«, X, are iid P(A) 5, 
A> 0, both X and 5° are unbiased for 0. But X is the UMVUE, whereas $2 
is not. í 


We now turn our attention to some methods for finding UMVUE's. 


Theorem 6 (Rao [96], Blackwell [9]. Let (Fj:0 € 0) bea family of prob- 
ability df's, and h be any statistic in U, where U is the (nonempty) class 
of all unbiased estimates of 0 with Eh” < оо. Let T be a sufficient statistic 
for {F}, 0 € Ө}. Then the conditional expectation Ej(A|T) is independent of 
0 and is an unbiased estimate of 0. Moreover, 


(9) EXE{h|T} — 0) < Eh — 0) forall eð. 
The equality in (9) holds if and only if h = E(A|T) (that is, 
Po{h  E(h|T)) = 1 for all б). 
Proof. We have 

ч E,{E{h|T}} = Еһ = 0. 
It is therefore sufficient to show that 
(10) E,{E{h|T}}? < Ej? . foralló c0. 
But EJ! = E,(E(WP|T )), so that it will be sufficient to show that 
an) ГЕНТ) < {Т}. 
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By the Cauchy-Schwarz inequality 
E E*(h|T) < E(IP|T) E(1|T), 
and (11) follows. The equality holds in (9) if and only if 
(12) EQE(h|TY = Eph’, 
that is, 
Ed E{h|T} — E*(h|T) = 0, ' 
which is the same as 
E,{var {h|T}} = 0. 

This happens if and only if var {|T} = 0, that is, if and only if 

E{h\T} = ЕНТ}, 
as will be the case if and only if Л is a function of T. Thus h = Е{А|Т). 


Theorem 6 is applied along with completeness to yield the following 
powerful result. 


Theorem 7 (Lehmann-Scheffé [69]. If T is a complete sufficient statistic 
and there exists an unbiased estimate h of 0, there exists a unique UMVUE 
of 0, which is given by E{h|T}. 


Proof. If h, hye U, then E{hy|T} and ЕТ) are both unbiased and 
EqE(h|T) — ЕТ) = 0 . forallóe8. 


Since T is a complete sufficient statistic, it follows that E(h|T) = E{h,|T}. 
By Theorem 6 E{h|T} is the UMVUE. 


Remark 5. According to Theorem 6, we should restrict our search to Borel- 
measurable functions of a sufficient statistic (whenever it exists). According 
to Theorem 7, if a complete sufficient statistic T exists, all we. need to do 
is to find a Borel-measurable function of T that is unbiased. If a complete 
sufficient statistic does not exist, an UMVUE may still exist (see Example 
10). М 
Example 7. Let Xj, X», --:, X, be J/(0, 1). X, is unbiased for 0. However, 
X = п) уу X, is a complete sufficient statistic, so that E{X,|X} is the 
UMVUE. $ 

«We will show that E{X;|X} = X. Let Y = n¥. Then Y is (nd, n), X 
is (0, 1), and (Xj; Y) is a bivariate normal rv with variance covariance 
matrix (1 1). Therefore 
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EU) = Ex, + 00.0 (ук) 


т lyme a 
r8 tin пф) ==", 


as asserted. ' 
If we let d(0) = 6%, we can show similarly that Y^ — 1/n is the UMVUE 
for 4(0) Note that X^ — 1/n may occasionally be negative, so that an 
UMVUE for 6” is not very sensible in this case. y 
Example 8. Let Xj, Xz, ---, X, be iid b(1, p) rv's. Then T = У.Х; isa 
complete sufficient statistic. The UMVUE for p is clearly X. To find the 
UMVUE for d(p) = p(1 — p), we have E(nT) = пёр, ЕТ? = np + n(n —1) 
р?, во that E(nT — T?) = n(n — 1) p(1 — р), and it follows that (nT — T?)/ 
n(n — 1) is the UMVUE for 4(р) = p(1 — р). 
Example 9. Let Xj, Xz, ---, X, be a sample from Ms a’). Then (X, S?)is a 
complete sufficient statistic for (и, 0?). X is ће UMVUE for р, and S? is 
the UMVUE for c^. Also k(n)S is the UMVUE for о, where 
k(n) = J/[(n — 1)/2] Fn — 1)/2]]Г(п[2). We wish to find the UMVUE 
for the pth quantile 3,. We have 


p-P(X x) -P[zs SLE. 


where Z is W(0, 1). Thus à» = 021-р + и, and the UMVUE is 
T(Xy, X», +++, Xp) = Z1-p k(n)S + X. 

Example 10 (Stigler [127]. We return to Example 8.3.19. We have seen that 
the family {РУ > 1} of pmf's of М, = max,-,., X; is complete and M, is 
sufficient for N. Now EX; = (N + 1)/2, so that T(Xj) = 2X, — 1 is unbiased 
for N. It follows from Theorem 7 that E( 1(X,)|M,} is the UMVUE of N. 
We have : ) 
y (у= пу"! iri a M e 
УЖ (УЕН if x; = 1,2, ++, y — 1, 

yr 


LIC MN ahaa 


P(X, = x,|M, = y} = 


Thus 
-1 ы п-1 
E(TGQIM у) ORE Y n - 040-0 4 n, 
uy y-1 +1 5 
ro 
is the UMVUE of М. 


If we consider the family 2 instead, we have seen (Example 8.3.19 and 
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Problem 8.3.12) that # is not complete. The UMVUE for the family 
(Py: N = 1} is T(Xj) = 2X, — 1, which is not the UMVUE for 2. The 
UMVUE for 2 is in fact, given by 


2 [2k-l Кет kAm +1, 
т) = (5, k=m, k= n +11. 


The reader is asked to check that T, has covariance 0 with all unbiased 
estimates g of 0 that are of the form described in Example 8.3.19 and Problem 
8.3.12, and hence Theorem 1 implies that Т, is the UMVUE. Since 
EnT\(X1) = no + 1/mg, T, is not even unbiased for the family (Py: N > 1}. 
The minimum variance is given by 


vary (Т(Х1)) if N < no, 


MEO" xem (TQ) - Ap AEN > to. 


PROBLEMS 8.4 


1. Let X, X, =, X, (n > 2) be a sample from b(1,p). Find an unbiased estimate 
for d(p) = р?. 

2. Let Xp X», X,(n > 2) be a sample from A(z, 03). Find an unbiased 
estimate for s^ where p + n > 1. Find a minimum MSE estimate of o’. 


3., Let X, Xn, X, be iid (и, g?) rv's. Find a minimum MSE estimate of the 
form 25° for the parameter 0°. Compare the variances of the minimum MSE 
estimate and the obvious estimate 5 

4. Let X~ КІ, 0?). Does there exist an unbiased estimate of 0? 

5. -Let X ~ P(A). Does there exist an unbiased estimate of d(2) = A^!? 

6. Let X, Xn» =, Х, be a sample from (1, p), 0 < p < 1, and 0 < s < n be an 
integer. Find the UMVUE for (a) d(p) = р", (b) d(p) = p: + (1 — py. 

7. Let X, Xp, +, X, be a sample from a population with mean б and finite 
variance, and Т be an estimate of 0 of the form T(X,, X» =, X,) = Dla: a;X;. If 
Т іѕ an unbiased estimate of 0 that has minimum variance and 7” is another 
linear unbiased estimate of б, then 

« ` соуу(Т, Т) = var, (Т). 

8. Let Т,, T; be two unbiased estimates having common variance ao*(a > 1), 
where 02 is the variance of the UMVUE. Show that the correlation coefficient 
between T, and T, is > (2 — a)/a. 


9. Let X ~ NB(1; 0) and 40) = P, {X = 0). Let Xs, Х,,.-., X, be a sample on X. 
Find the UMVUE of 400). 1 
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10. This example covers most discrete distributions, Let X;, Х,, ---, X, bea sample 
from pmf 


РДХ =x) = oe x re 0, 152.0 


where @ > 0, a(x) > 0, ДӨ) = Уу a(x)0*,a(0) = 1, and let T= X, - X,4- --- + Xp. 
Write 


cnm--xN П а(х): 


на 
with У x,-t 
fel 
Show that T is a complete sufficient statistic for б, and that the UMVUE for 
d(0) = 6'(r > 0 is an integer) is given by 
as ift <r, 


үд) = дк drin) 
[cun 127. (Royand Mitra [105]) 


ll. Let X be а hypergeometric rv with pmf 


эы =} = (3) (М) (NSM) 


where max (0; Mn — № s xs min (M, л). 

(a) Find the UMVUE for M when N is assumed to be known. 

(b) Does there exist an unbiased estimate of (М known)? 

12. Let X,, X, ---, X, be iid G(1, 1/2) rv's à > 0. Find the UMVUE of PX, 4), 
where ñy > 0 is a fixed real number. ? 

13. Let X,, Xy» ---, X, bea random sample from P(A). Let (4) = Xo c,A* be a 
parametric function. Find the UMVUE for. (А). In particular, find the UMVUE 
for (a) (Ху = 1/(1 — А), (b) 9(4) = 2° for some fixed integer 5 > 0, (с) f(a) = 
P,(X = 0), and (d) (4) = Р(Х = 0 or 1). 

14.. Let Xy, X, --, X, bea sample from pmf > 


Рх) = 1, xehLe М 


Let (М) be some function of N. Find ће UMVUE of GN). 

15. Let X, X, ---, X, be a random sample from P(A), Find the UMVUE of 
YA) = P,(X = К), where k is a fixed positive integer. 

16. Let (X, Yj), (X» Y), =, (Xœ Y,) be'a sample from a bivariate normal 
population with parameters у, дь 03, 03, and p. Assume that ш = до = y; and 
it is required to find an unbiased estimate of и. Since a complete sufficient statistic 
does not exist, consider the class of all linear unbiased estimates 


Ala) = aX + (1 — a)Y. 
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(a) Find the variance of 2. 
(b) Choose a = a to minimize var (f), and consider the estimate 
Ay = MX + (1 — ao) Ý. 


Compute var (ip). If v, = az, the BLUE of и (in the sense of minimum variance) 
is 


irrespective of whether о, and p are known or unknown. 
(c) Ifo, * e, and p, оу, v; are unknown, replace these values in ау by their 
corresponding estimates. Let 


Show that 
f f= Y¥+(X¥- Ya 

is an unbiased estimate of д. 
17. Prove Theorem 3. 
18. Prove Theorem 5. 
19. In Example 10 show that Т, is the UMVUE for N (restricted to the family 2), 
and compute the minimum variance. > 
20. Let (X, Yi), --:, (Xp Ү,) bea sample from a bivariate population with finite 
variances о} and oi, respectively, and covariance у. Show that 


Dip in гл 
17 Cena 


1 n= 
var (Sii) == (us Wr an 
where д» = E[(X — EX)? ( Ү- ЕҮ):). Itisassumed that appropriate order moments 
exist. 


21. Suppose that a random sample is taken on (X, Y) and it is desired to estimate 
T, the unknown covariance between X and Y. Suppose that for some reason a set 
$ of n observations is available on both X and Y, an additional n, —n observations 
are available on X but the corresponding Y values are missing, and гп additional 
п; — п observations of Y are available for which the X values are missing. Let 5; 
be the set of all m (> п) X values, апі .5,, the set of all m (> п) Y values, and 
write 


ГАЛ f= Ёа, f.m". E 
т п; п п 
Show that 


DE E IE LG- - P) 


isan unbiased estimate of y. Find the variance of 7, and show that var (7) < 
var (S3); whete S,, is the usual unbiased estimate of 7 based on the л observations 
іп S. i » f (Boas [10]) 
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8.5 UNBIASED ESTIMATION (CONTINUED): 
A LOWER BOUND FOR THE VARIANCE OF AN ESTIMATE 


In this section we consider two inequalities, each of which provides a lower 
bound for the variance of an estimate. These inequalities can sometimes be 
used to show that an unbiased estimate is the UMVUE. We first consider 
an inequality due to Fréchet, Cramér, and Rao (the FCR inequality). © 


Theorem 1 (Fréchet [34], Cramér [19], Rao [95]). Let be an open interval 
of the real line, and {/,: 0 єӨ} be a family of pdf's or pmf's. Assume 
that the set {f(x) = 0 for every 6€ Ө} is independent of 0. For every 0. 
let Afp(x)/00 be defined. Suppose that 


ap | Hx) ax = [Ff dx = 0 iE isa pat, 


а) 2 а уг; 
d BM) = BA 0-0 fisa pmf, 


for every 0 e Ө. [Here, as usual, х = (xy, xo ©, х,).] 


Let ф be defined on Ө and be differentiable there, and let T be an unbiased 
estimate of ¢ such that ET? < co for all 0. Assume that 


b Ff тоол dx = | то) 35.) de iffa is a pat, 
dy E Тоол) = X T6) Fy 9) it fis a pmf, 

for every "m Ө. Let o be any function of Ө + 2. Then 

в) WOP < вит — 903", (2128400) 

for every 0 € 8. 


For any бу € 6, either d/(0,) = 0 and equality holds in (3) for 0 = 0o, or we 
have 


e EMT - 90) > к-ту ru" : 


If, in the latter case, equality holds in (4), then there exists a real number 
K(0) # 0 such that 


O TE-A) = к) 2198/00) with probability 1, 


provided that T is not a constant. 
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Remark 1. Conditions (1) and (2) are frequently known as regularity 
conditions. It is clear that Б,{д log Ja (X)/00)* is well defined and satisfies 


0s p, [2198/00 < б. 


Remark 2. Sufficient conditions on {fo 0 € 8) for (1) and (2) to hold may 
„be found in P.2.12 and P.2.13. 


Proof of Theorem 1. Yt follows from (1) that E,(2 log S&X)/00} = 0-and 
from (2) that 


EAT) 236/09) _ gy, 


BÁT - дө ZIBA) _ „у, 


Ву {һе Cauchy-Schwarz inequality, (3) follows immediately. 

To prove (4) it suffices to consider either the case where Ф'(бу) #0 or 
the one where in (3) the equality sign does not hold for 0 = 00. In either 
case it follows from (3) that Es {9 log JX)/00Y* > 0, and (4) follows. 

If the equality holds in (4), then necessarily ¢'(0)) 5 0. Therefore from 
the Cauchy-Schwarz inequality there exists a real number K(09) such that 


so that 


Y i д log f(x) 
TG) — e) = Кб) 2 9), 
and (5) holds. Since T is not a constant, it also follows that K(0) # 0. 
Remark 3. If we take ф = ¢ in (4), we get 


~ POF 
" T COD sos ODE" 


In particular, if (0) = 0, then (6) reduces to 


0 \ (TOO) > [ s (2108 лоо! 


Remark 4. If X = (Xj, Xp, =, X,) is a sample from a pdf (pmf) A(x), бє Ө, 


. then 


ma MT 
and (6) reduces to 


-— 
TONO LY 


\ 
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wor 
пЕ,{д log fo X1)/00} 


Example 1. Let X ~ b(n, p); Ө = (0, 1) с 2. The only condition that need 
be checked is differentiability under the summation sign. We have ` 


фр) = ЕТО) = ¥ (у) тора - 0 


which is a polynomial in p and hence can ‘be differentiated with respect to 
p. For.any unbiased estimate T(X) of p we have 


var, (TOO) > + (l= 9) 


E o n 


it follows that the variance of the estimate X/n attains the lower bound of , 
the FCR inequality, and hence the estimate is the UMVUE. 


(8) var, (T(X)) = 


and since 


Example 2. Let X be U[0, 0]. Then 


1 5 
дэ -|0 fos ха, 
0 otherwise, 
and @ = (0, 00). 

Here the regularity conditions do not hold. Let us compute nE {@ log 74/90". 
We have E{@ log f(X\/a6}? = 1/6, so that the lower bound of the FCR inequality - 
is @/n. We leave the reader to check that [(n + 1)/n] max (X,, X» "+ X) is 
unbiased for 6 and has variance 67/[n(n + 2)], which is much smaller than the FCR. 
lower bound. This is not surprising since the regularity conditions do not hold. 
Note that [(n + 1yn] max (X,, ---, X) is the UMVUE since max (X,, Xa ---, X) isa 
Example 3. Let X ~ P(A). We leave the reader to check that the regularity 
conditions are satisfied and ( у 

; var, (Т(Х)).> А. ‘ 
Since 7(X) = X has variance a, X is the UMVUE of à: Similarly, if we 
take a sample of size n from P(A), we can show that 


vat, (TG, =» X2) = È 
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and X is the UMVUE. 
Let.us next consider the problem of unbiased estimation of ФА) = e^, 
based on a sample of size 1. The estimate 


I" Yl 
ax) = (5 if X>1, 


is unbiased for d(4) since 

E XX) = Еу) = P; (X = 0} = e. 
Also, 

var; (900) = ет(1.— ei). 
To compute the FCR lower bound we have 
log fix) = x log А — A = log xl. 

This has to be differentiated with respect to e^^, since we want a lower 
bound for an estimate of the parameter е^. Let б = е^. Then 

log f(x) = x log log 1. + log 0 — log x!, 


di BM) m х уруу +4, 


and 
E t$ log fx} = rat + 02919 1 
+ та og. + (ву) )} 
= e -2+ 4 Q2) 
Loue 
AS 
so that 
vary T(X) > ze E 


y 


where 0 = e^. 
Since e (1 —e™?) > Je for A > 0, we see that var (2(X ))is greater than 
the lower bound obtained from the FCR inequality. We show next that 2(Х) 
is the only unbiased estimate of б, and hence isthe UMVUE. © 
If h is any unbiased estimate of 0, it must satisfy Eyh(X) = 0. Now the 
pmf of X in terms of 0 is 4 { & 
Ло) = 098 OF 81,2, 


x! 
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so that 

со к 
0 = EK) - 03 Wk) COBH гай +> 020 


It follows that A(0) = 1 and h(k)=0 for k=1, 2, ---, that is, that 
W(X), = д(Х), as asserted. 


We next consider an inequality due to Chapman, Robbins, and Kiefer 
(the CRK inequality) that gives a lower bound for the variance of an estimate 
but does not require regularity conditions of the Fréchet-Cramer-Rao type. 


Theorem 2 (Chapman and Robbins [11], Kiefer [58]). Let O c 2 and 
(f(x): 0€ ө} be a class of pdf's (pmf’s). Let g be defined on 6, and let T 
be an unbiased estimate of (0) with ET? < оо for all бє 8. If 0 # p, 
assume that f; and f, are different and assume further that there exists a 
фє Ө such that 0 # p and 


(9) s() = (у>: 0) 50) = {4,00 > 0). 
Then 

CORTON 
q0) varo (ТО) >Ш, „е varo (oR) LO) 
for all 0 € Q. 


Proof. Since T is unbiased for ¢, E,T(X) = Ky) for all р Ө. Hence, for 
9 7 06, 


Fae TDG Bhs AME 
an Í тк) 09.149) ff) dx = Ko) — 40). 
which yields... 


cov, (rao. ! — 1} = фр) – HO). 
Using ће Cauchy-Schwarz inequality, we get 


cosi (roo. о — 1} < vary (70) vat { 


È 


= tf 


= var (T(X)) vas ( : ) 


var, (Т(Х)) = E 2 СА , 


PEE 


Thus 
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and the result follows. In the discrete case it is necessary only to replace the 
integral in the left side of (11) by a sum. The rest of the proof needs no 
change. T 


Remark 5. Inequality (10) holds without any regularity conditions on f; 
or (0). We will show that it covers some nonregular cases of the FCR 
inequality. Sometimes (10) is available in an alternative form. Let 0 and 
0 4- д (0 # 0) be any two distinct values in Ө such that S(0 + 2) с S(0), 
and take (0) = 0. Write 


J= 50,6) = x — ) à i}. 
Then (10) can be written as 


" 1 
(12) vary (T(X)) > ED 
where the infimum is taken over all à 5 0 such that S(O + д) c S(0). 


Remark 6. Inequality (10) applies if the parameter space is discrete, but the 
Fréchet-Cramér-Rao regularity conditions do not hold in this case. 


Example 4. Let X be U[0, 6]. In Example 2 we showed that the regularity 
conditions of FCR inequality do not hold in this case. Let 40) = 0. If o < 0, 
then S(p) с S(0). Also, 


mL DUO 
Thus 


var, (T(X)) > UP, i52: = sup {90 = 9)} = RE 


Ф: p< LE ES 
for any unbiased estimate T(X) of 0. X is a complete sufficient statistic, 
and 2X is unbiased for 0 so that T(X) = 2X is the UMVUE. Also 


wu 2X) = 4загх = © > E, 


Thus the lower bound @°/4 of ће CRK inequality is not achieved by any 
unbiased estimate of 0. 


Example 5. Let X have pmf 


1 
Py{X = k} [m k=1,2,-,N 
0, otherwise. 


| 
| 


| 
1 
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Lét @ = (N: N > M, M > 1 given}. Take (№) = N. Although the reg- 
ularity conditions do not hold, (10) is applicable since, for N + N' E Өс 2, 
S(N) = {1, 2, +) N} 3 QN) - (L2, М) х it NN. 

Also, Py and Py, are different for N # №. Thus 


У (N — NY 
vary (Т) > Sup ЕТЕК РРА 


Now 
N ‚ом 
Fee (x) = AD sf х= 1,2, n №, М < N, 
y 0, à otherwise, x 
PAXP 1 MENON 
Ру) Y 2 ON Se (ee A 
EUR) N E(w) = Am 
and 
Py(X) N ? 
Py VS T 
vary {pe} N 120: о> №. 
It follows that 
NOES 
var, (TOO) > sup eon = sup NN - №), 
Now 


+1 


RN =) if and only it k < XL, 


€-0)*N-kxD^! 


so that N'(N — N’) increases as long as № < (N + 1)/2 and decreases if 
N' > (N + 1)/2. The maximum is achieved at [(N + 1)/2]. Therefore 


vary (T(X)) > [A] fn- [^ - |, 


where [x] is the largest integer < x. { 

In this case X is a complete sufficient statistic, and 2X — 1 is unbiased for N 
and hence is the UMVUE for N. The variance of the UMVUE is (N? — 1)/3, 
which is > [(N + 1/2] {N — [(N + 1)/2]} for N > 2. 


Example 6. Let X ~ (0, 02). Let us compute J for à # 0. 
fo (X) 2 n 2 
E, ER p p = та: Гу 20) 1] 
Ce) oo EER) 


ы-ы iE) e 
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where c = (0° + 200) | (a + ду. 
Since X Х2[0° ~ у (л) 
= 1 ций ү” 1 -1 fondle 5 
pis "Puy ü — 297 ] fore<4 
Let k = d/o; then E 
‚ kk pl» Se 
RU AUR Bs йы араар 
aem n + ЕЎ 
Е,Ј = 3 [1 F ky" (1 = 2k — ky”? — 1], 
g 
Here 1+k>0 and 1—2c>0, so that 1 — 2k — k? > 0, implying 


—/'2<К +1 < y2 and also k > — 1. Thus -1 <k < y2 – 1 and 
‘k # 0. Also, 


5 _„(а+ю"(1-2&— у”? — 1 
cat [o 


mr 
by L’Hospital’s rule. We leave the reader to check that this is the FCR 
lower bound for var, (7(X)). But the minimum value of E,J is not achieved 
in the neighborhood of k=0 so that the CRK inequality is sharper than 
the ЕСК inequality. Next, we show that for л = 2 we can do better with 
the CRK inequality. We have 


1 1 

À ге (ажуа ~ |) 
È (k + 2) да 
“FeO woe kT SES 2-1 Кио. 


‚ For К = —.1607 we achieve the lower bound as (E,J)-! = .26980°, so that 
var, (T(X)) > .269802 > 0/4. Finally, we show that this bound is by no 
means the best available; it is possible to improve on the Chapman-R obbins- 
Kiefer bounds too in some cases. Take 


гар J£ 
T(Xy, X» c X) T Ti КҮТ д 
to be an estimate of о. Now E,T = с апі 


_@/( Tro) y és 
ET = ^ (түүл) Е 


x Еч {л }. 


є: 


| 
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so that 


var, (T) = o i Tes om) с i 


Forn = 2, 
var, (T) = о? [= н 1} = .27320%, 
which is > .2698c^, the CRK bound. Note that T is the UMVUE. 


Remark 7. In general the CRK inequality is as sharp as the FCR inequality. 
See Chapman and Robbins [11], pages 584-585, for details. 


We next introduce the concept of efficiency. 


Definition 1. Let T;, Т; be two unbiased estimates for a: parameter б. 
Suppose that E; та < oo, ET; « oo. We define the efficiency of T, relative 
to T; by 


(13) ety (ТТ) = ча {ТУ 
and say that T, is more efficient than T; if 


(14) ей, (Т.Т) < 1. 


It is usual to consider the performance of an unbiased estimate by com- 
paring its variance with the lower bound given by the FCR inequality. 


Definition 2. Assume that the regularity conditions of the FCR inequality 


are satisfied by the family of df's {Р„ 0€6), Ө c 2. We say that an 
unbiased estimate T for parameter 0 is most efficient for the family {Fy} if 


(15) vary (T) = [е ооруу. 


Definition 3. Let T be the most efficient estimate for the regular family of df’s 
{Fo 0 € Ө}. Then the efficiency of any unbiased estimate Т, of 0 is defined as 
(9. аит) = et (TT) = XR CD 


Clearly the efficiency of the most CA estimate is 1, and the- efficiency 
of any unbiased estimate T; is > 1. 
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Definition 4. We say that an estimate Т, is asymptotically (most) efficient if 
(17) lim eff; (Ту) = 1 
»—oo 


and Т, is at least asymptotically unbiased in the sense that lim,_.. ET, — 6. 
Here n is the sample size. 


Remark 8. Definition 2, although in common usage, has many drawbacks. 
We have already seen cases in which the regularity conditions are not 
satisfied and yet UMVUE's exist. The definition does not cover such cases. 
Moreover, in many cases where the regularity conditions are satisfied and 
UMVUE's exist, the UMVUE is not most efficient since the variance of 
the best estimate (the UMVUE) does not achieve the lower bound of the 
FCR inequality. 


Example 7. Let X ~ b(n, p). Then we have seen in Example 1 that X/n 
is the UMVUE since its variance achieves the lower bound of the FCR 
inequality. It follows that X/n is most efficient. 


Example 8. Let Xj, X» =, X, be a sample from (д, с 2), Where both y 
and g“ are unknown. Then ee "s5i is jointly sufficient for (д, 02). Consider 
the estimate S? for o^. Since (n— 1) 5/02 ~ у (п — 1), we have 
var (S?) = 2o*/(n — 1). The FCR inequality lower bound for an unbiased 
estimate of 02 can be shown to be 20*/n, so that 
2 2о`|(п.— п 
eff (S?) = кс - ait xL. 
However, since eff (S?) — 1 as n — со, S? is asymptotically efficient. 


Finally we explore the relationship between most efficient unbiased esti- 
mates and UMVUE's. 


In Theorem 1 we showed that, if J/(0) # 0 and T is most efficient for б, 
then 


(18) E [T(x) - 46) = 2log ga). forall 0єӨ. 
_ Let us assume for simplicity of notation that Ө = 2 and that 
a) ... (-f xy. TO- f 95^ 


are defined, and let іту. -s f (x) = Ах). Then аи (18) with respect 
to 0 yields 


(20) с Лх) = exp {Т(х) C(0) — 00) + Ж) 
_ for all xe 2, and all 0c 4. This is a one-parameter exponential. family, and 
we see that T is sufficient. 
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We have thus shown that the most efficient estimate T of a parameter (0) 
is sufficient and satisfies (18). In the converse direction we can state that, if 
T is sufficient for (8) and satisfies (18), it is most efficient. To see this, first 
note that sufficiency of T implies that the conditional distribution, given 
T = t, is independent of б. Using (18), we have 2 ү 


QU PAG LUN go -T и vate (TX). 
Using (18) in 
Blir - go 108 00) - gp, ^. 

we get 
Q2) кф) E, (2198 HOV Go) 
From (21) and (22), it follows that у 

vary (7) = wo [ehee 
as asserted. We have thus proved the following theorem. 


Theorem 3. А necessary and sufficient condition for an'estimate T to be 
most efficient is that. T' be sufficient and (18) hold. 


Clearly, an estimate T satisfying the conditions of Theorem 3 will be the 
UMVUE, and two estimates coincide. We emphasize that we have assumed 
the regularity conditions of FCR inequality in making this statement. 


Corollary. If {fọ} is a one-parameter exponential family given by (5.5.1) 
and Q(0) has a continuous nonvanishing derivative (on 6), ' TOi is - most 
efficient estimate and the UMVUE of E,(7(X)). 


Remark 9. For a rigorous proof of the fact that, if the variance of an 
unbiased estimate attains the FCR. lower bound, the family of distributions 
must be a one-parameter erppnential family, the reader is referred to Wijsman 
[140]. 7 \ 


PROBLEMS 8.5 
1. Are the followirig families of distributions regular in the sense of Fréchet, : 


Cramér, and Rao? If so, find the lower bound for the Variance of an unbiased 
estimate based on a sample of size n. 
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(а) Sx) = 0-1 e-*/* if x > 0, and = 0 otherwise; 0 > 0. 

(b) fix) = e-6-^ if <x < co, and = 0 otherwise. 

(с) F(x) = Ol = 0) x= 0) 1) 2, +; O< 0 < 1. 

(d) Лх; 02) = (l/o y 2x) e", — со < x < co; 07 > 0. 

(e) In (d) find the lower bound for the variance of an unbiased estimate for c. 
2. Find the CRK lower bound for the variance of an unbiased estimate of б, 
based on a sample of size n from the pdf of Problem 1(b). 


3. Find the CRK bound for the variance of an unbiased estimate of Ó in sampling 
from (0, 1). 


4. In Problem 1 check to see whether there exists a most efficient estimate in each 
case. 


5. Let X, X, ---, X, bea sample from a three-point distribution: 


Pix- y = 150, Р-У) = 1, РХ=)=4, 


where 0 < 0 < 1. Does the FCR inequality apply in this case? If so, what is the 
lower bound for the variance of an unbiased estimate of 0? 


6. Let X, X, ^, X, be iid rv's with mean x and finite variance. What is the 
efficiency of the unbiased (and consistent) estimate [2/n(n +.1)] 2:7. iX, relative 
‘to X? 

7. When does the equality hold in.the CRK inequality? 

8. Let Xy, Xn» ---, X, be a sample from ЈУ (1,1), arid let Җи) = pè. 


(а) Show that the minimum variance of any estimate of 4? from the FCR. inequal- 
— ity is 4g?[n. 


(b) Show that X., Х„ =, X,) = X? — (1/n) is the UMVUE of z? with variance 
(Ain + 2/n*). SN 2 


9. Let Xi, Xo .--‚ X, be iid С(1, 1/a) гуз. 


(@) Show that the estimate ТЖ, X, ---, Xq) = (n — 1)/n X is the UMVUE for 
а with variance a?/(n — 2). y 

(b) Show that the minimum variance from FCR inquality is а?/л. 

40. In Problem 8.4.16 compute the relative efficiency of 2, with respect to Д,. 


Hi. Let X, Xs s X, and Y,, Ya --., Yq be independent samples from (р, ot) 
and (а. oi), respectively, where д, 03, a; are unknown. Let p = 02/0: and 
0 = т/п, and consider the problem of unbiased estimation of д. 


(a) If p is known, show that 

By =aX+(l—a)¥ 
j where a c (ДО + 0) is the BLUE of д. Compute var (Де). 
b) If pis unknown, the unbiased estimate, +; n>- 


а... 
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1460 Jag 
is optimum in the neighborhood of p — 1. Find the variance of д. i 
(c) Compute the efficiency of A relative to ĝo. 
(d) Another unbiased estimate of и is 
pFX + 0Y 
aaa er aes 


where Е= 51/05? is an F(m — 1, n — 1) rv. 


8.6 THE METHOD OF MOMENTS 


One of the simplest methods of estimation is the method of moments, which 
we study in this section. In a wide variety of problems the parameter to be 
estimated is some known function of a given (finite) number of moments. 
Let us suppose that we are interested in estimating 


(1) 0 = h(my mp, +++, ту), 


where h is some known numerical function, and т, is the jth-order moment 
of the population distribution that is known to exist for 1 < j < k. 


Definition 1. The method of moments consists in estimating 0 by the statistic 


QT a) = Hort È Xa m È X, no к). 


To make sure that Т is a statistic, we will assume that h: 2, Ф isa 
Borel-measurable function. jd: 


Remark 1. It is easy to extend the method to the estimation of joint 
moments, Thus we use n^! У X, Y; to estimate E(XY), and so on, 


Remark 2. From the WLLN, пу X7 2, EX’. Thus, if one is interested 
in estimating the population moments, the method of moments leads to 
consistent and unbiased estimates. Moreover, the method of moments 
estimates in this case are asymptotically normally distributed (see Section 
7.3). 

Again, if one estimates parameters of the type 0 defined in а) and h is 
a continuous function, the estimates T(X;, X2, *--, Х„) defined in (2) are 
consistent for 0 (see Problem 1). Under some mild conditions on A, the 
estimate: T is also asymptotically normal (see Cramér [18], 386-387). 
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Example 1. Let Xj X» s X, be iid rv's with common mean u and 
variance c*. Then g = {у (то — mi), and the method of moments estimate for 
a is given by 


TOL Y yw JEE Qu. 


Although T is consistent and asymptotically normal for о, it is not unbiased. 
In particular, if Xj, X», =, X, are iid P(A) rv's, we know that EX; = А 
and var (X;) = 2. The method of moments leads to using either X or 
7 (X; — X'[n as an estimate of 2. [Note that X and у (X; — X)"/n have 
different units of measurement.] To avoid this kind of ambiguity we take 
the estimate involving the lowest-order sample moment. 


Example 2. Let X;, Xo, =; X, be a sample from 


1 
fs) = [rz нела фар 
, otherwise. 
Then zT 


Exe 3b. and | var (Xx) mb pd. 


The method of moments leads to estimating EX by X and var (X) by 
Di (X, — XY[n, so that the estimates for a and b, respectively, are 


3 (xX, — XP 
Arai ive ашыш 
and 

35 (xX, = 2% 
ТАК X) = X +4] 1 —— 


Example 3. Let Xj, Xz} «:+; Xy be iid b(n, р) rv's, where both n and p are 
unknown. The method of moments estimates of p and п are given by 
: X= ЕХ = np: 
Коо $2 
and 
145 Moved) dua 2 

ay XG = ЕХ? = on = р) + rp. 

Solving for nand p, we set the estimate for pas 


TX 49) = gry py 
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where Т;(Х\, +: Xy) is the estimate for n, given by 
(xy 
TAX, Х, 5S Xu) = —— туу. 
| хъ -(X хам). 
1 
Note that X P пр, DYX?/N ^ пр(1 — p) + пёр", but T; does not converge 
in probability to n (see Remark 2 and Problem 1). 


PROBLEMS 8.6 


1. Let X, La and Y, Z, b, where a and b are constants Let h: 2, + 4 bea 
continuous function. Show that A(X,, Y,) => h(a; б). 

2. Let X, X; =, X, be a sample from G(a, B). Find the method of moments 
estimate for (a, f). 

3. Let X, X, +--+, X, be a sample from (u, 02). Find the method of moments 
estimate for (и, 0). 


4, Let X, Xs, «+X, be a sample from B(a, p). Find the method of moments 
estimate for (а, 8). 


5. A random sample of size n is taken from the lognormal pdf. 
f(x п, о) = (oy 2x )-! x- exp (uates x - ny. х> 0. 


Find the method of moments estimates for и and 02. 


8.7 MAXIMUM LIKELIHOOD ESTIMATES 


In this section we study a frequently used method of estimation, namely, the 
method of maximum likelihood estimation. Consider the following example. 


Example 1. Let X ~ b(n, p). One observation on X is available, and it is 
known that л is either 2 or 3 and р = 4 or 1. Our object is. to find an 
estimate of the pair (л, р). The following table gives the probability that 
X = x for each possible pair v p): 


x (4). (24) (3,1) (G4) Maximum Probability 


0 1 H 4 i $ 
1 + $ 1 Е] i 
2 + $ 1 E + 
3 0 9: rig: 4 t 


The last column gives the maximum probability in each row, that is, for 
each value that X assumes. If the value x = 1, say, is observed, it is more 
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probable that it came from the distribution (2, }) than from any of the 
other distributions, and so on: The following estimate is, therefore, 
reasonable in that it maximizes the probability of the observed value: 
(2, t if. x = 0, 
(2, 4) if x = 1, 
GH if x=2, 
(3,4) 1 = 3; 

The principle of maximum likelihood essentially assumes that the sample 
is representative of the population and chooses as the estimate that value of 
the parameter that maximizes the pdf (pmf) f(x). 


Li 


(8, 5) ©) = 


ї 


Definition 1. Let (X, X» +, X,) be a random vector with pdf (pmf) 
хх, s. x,), 0 € Ө. The function 


(1) L(B; xy, х2, +++, Xn) = fox, Xo +, Xs), 
considered as a function of 6, is called the likelihood function. 


Usually 6 will be a vector of parameters. If Yi, X;,---, X, are iid with pdf 
(pmf) A(x), the likelihood function is 


Q LO; хь хь 7 x) = Й AU. 
Let 0 € 2, and X = (X,, Xp, =, X,). 


Definition 2. The principle of Maximum likelihood estimation consists of 


choosing as an estimate of 6 a (X) that maximizes L(0; ху, хоз Xa), that 
is, to find a mapping @ of 2, > &, that satisfies 


(3) VTL cp trus sup L(0; x, хь, x,). 


(Constants are not admissible as estimates.) 


Ifa 6 ‘satisfying (3) exists, we call ita maximum likelihood estimate (MLE). 
The method of maximum likelihood estimation attempts to find the mode 
[the value that maximizes the pdf (pmf)] of the distribution. The fact that 
the mode is usually a poorer estimate than the mean or the median 
explains why the small-sample properties of the method are, in general, poor. 
For large samples, however, the mode tends to approach the mean (if it 
exists) and the median, and the method has many large-sample properties, 
as we shall see. [It is not difficult to Construct examples in which 0 is the 
"t of the distribution but the MLE is not the sample mean. See Problem 
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It is convenient to work with the logarithm of the likelihood function. 
Since log is a monotone function, 


@) log LB; xy 7s хи) = sup log L003 xi, x) 


Let Ө be an open subset of 2, and suppose that f(x) is a positive, 
differentiable function of 0 (that is the first-order partial derivatives exist in 
the components of 0). If a supremum @ exists, it must satisfy (see P.2.3). 
the likelihood equations 


(5) ксы Xu 7s рый ЖЫЗ) ago 0,351. 0. 
i \ 

Any nontrivial root of the likelihood equations (5) is called an MLE іп 
the loose sense. A parameter value that provides the absolute maximum of . 
the likelihood function is called an MLE in the strict sense or, simply, an 
MLE. 


Remark 1. If @ © 2, there may still be many problems. Often the likelihood 
equation 0L/00 = 0 has more than one root, or the likelihood function is 
not differentiable everywhere in Ө, or Ô may be a terminal value. Sometimes 
the likelihood equation may be quite complicated and difficult to solve 
explicitly. In that case one may have to resort to some numerical procedure 
to obtain the estimate. Similar remarks apply to the multiparameter case, 


Example 2. Let Xj, Xp; «++, X, Вега sample from (07), where both д 

and g^ are unknown. Here Ө = {(и, 0°), — oo <p < oo, g^ > 0}. Th 

likelihood function is j 
n 


2, E = 1 т Gi= 
Ци, 0° 3 Xy +, Xn) полу? exp { E p» 


uy. y 
and 
" ES Ee iy 
log L(u, 0%; X) = — 7 1080 тріо Qe. trees 
The likelihood equations are | 
1 20 
TA Zz (= д) =0 
and ) 
1 is 
к зер ats Qm eae ҮТ 
Solving the first of these equations for и, we get  — X and, substituting in-the 
second, 02 = X7, (Ж; — УУ In]. We see that (A, 6^) € Ө with probability 1. 
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We show that (2, ô?) maximizes the likelihood function. First note that X 
maximizes L(y, 0°; x) whatever 0° is, since (ц, 0°; x) > 0 as |u| > co, and 
in that case L(A, 0°; x) > 0 as g? — 0 or co whenever 8 € 6, 6 = (д, 65). 

Note that 2° is not unbiased for g^. Indeed, Ед? = [(n — 1)/n] o°. But 
пд? (п — 1) = S? is unbiased, as we already know. Also, д 18 unbiased, and 
both /) and 2° are consistent. In addition, 4 and 6” are method of moments 
estimates for и and c^, and (д, 6”) is jointly sufficient. 

Finally, note that д is the MLE of wif g” is known; but if “is known, the 
MLE of c* is not 2° but X7? (X, — ш)?/п. 


Example 3. Let Xj, X, ---, Y, bea sample from pmf 


1 
Руй) = [т po Seat, 
0 otherwise. 


The likelihood function is 


1 З 
LN; ki, Ко, ky) = Im: Ls max (i; ~) М, 
0, otherwise. 
Clearly the MLE of N is given by 
AG, Xo, 7, X,) = max (t X5 7 X,), 
for, if we take any à < Ñ as the MLE, then Palki Koy +++, Ky) = 0; and 
if we take any § > Ñ as the MLE, then Psi, Kes. >з ky) = AY. 
< UY" = Palki ко, ---, kn). 


We see that the MLE Ñ is consistent, sufficient, and complete, but not 
' unbiased. 


Example 4. Consider the hypergeometric pmf 
M\(N- M 
(& " eed ) 


n 


Р(х) = Ё max (0, n— N + M)x xx min (n, M), 
otherwise. 
To find the MLE Ñ = A(X) of N consider the ratio 
Рух) _ N-n ^ N-M 
RN) = PNG) . 
(N) Рү_1(х) N N-M-nctx 

For values of N for which R(N) > 1, Р(х) increases with N, and for 

values of N for which R(N) < 1, Py(x) is a decreasing function of N: 


^d 


овоза ARE) ЧА y vif and. only if <A, Г 


} 
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and 


R(N)«1. if and only ifi N > a 


It follows that Py(x) reaches its maximum value where N ~ nM/x. Thus 
N(X) = [nM] X), where [x] denotes the largest integer < x. 


Example 5. Let Xj Xo =- X, be a sample from U[0 — 1, 0 +4). The 
likelihood function is Ра 
Li if @ — 4 < min (xi c Xa) 
L(0; хь Xo 7, Xn) = < max (xi +, Xn) < 0 +4; 
0 otherwise. 


Thus £0; x) attains its maximum provided that 
@—4<min(x,---,x,) and 0 + iz max(x, 5 х»), 
or when 
0 < min (p 5, xX) + 1" and # > max(x, =, Xn) =} 
It follows that every statistic T(X, Xo, ++, Xn) such that 
ON max x-is T, Xn е» X) < min X, +4 


is an MLE of 0. ас {о0г0 <a< l; hate 
TAX, =, X) = max X, — $+ a(l + min X, — max Х) 
isisn 1sisn 1sisn 


lies in interval (6), and hence for each a,0 <a < 1, T0, +, Xn) is an 
MLE of 0. In particular, if a = }, 


Tis (6s 75 X) = pap Xess Xe 


isan MLE of б. 


Example 6. Let X ~ М1, ке 1]. 1п this case Z(p; x) = PU = р)", 
x — 0, 1, and we cannot differentiate L(p; x) to get the MLE of p, since that 
would lead to p = x, a value that does not lie in Ө = [h 1]. We have 


xel 


hes ТЫ тш 


which'is maximized if we choose p(x) =} if x = 0, and = iifx-l. 
Thus the MLE of p is given by - 
2Х+1 


кх) = 2E. 


: Note that E,EQC ).= (2р + DA, so that f is biased. Als the mean quus 


error for f is 
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ЕХ) - pf = k Е2Х + 1— 4pf =. 

In the sense of the MSE, the MLE is worse than the trivial estimate 
XX) = 4, for Et — pf = ($ р) < & for pe ft, i]. 

Example 7. Let Xy, Xo, «++, X, be iid (1, p) rv's, and suppose that p e (0,1). 
Tf (0, 0, --, 0) ((1, 1, «++, 1)) is observed, x = 0(x = 1) is the MLE, which 
is not an admissible value of p. Hence an MLE does not exist. 

Example 8 (Oliver [87]. This example illustrates a distribution for which 
an MLE is necessarily an actual Observation, but not necessarily any 
particular observation. Let Х\, Xj, ---, Y, bea sample from pdf 


218, 0sxx0, 
а 0 
12 ах 
тее Ө<хха, 
0, otherwise, 


where a > 0 is a (known) constant. The likelihood function is 


meo = OF) (E53) 
Lors a+r) = (у, (3) п, (а), 
where we have assumed that observations are arranged in increasing order 
of magnitude, 0 < x, < X2 < < x, S а. Clearly L is continuous in 0 (even 
for 0 = some х;) and differentiable for values of @ between any two xp's. 
Thus, for x, < 0 < Ху We have 


LO) = Gy 97 (y - oy? Д x, oh (a — xj, 


9lgL _ n-j дор j n-j 

ir a reat, and Й ЫС? 
It follows that any Stationary value that exists must be a minirnum, so that 
there can be no maximum in any range Ху € 0 < хуу. Moreover, there can 
be no maximum in 0 <0 < ж Of х„ < 0 x «. This follows since, for 
050 < 5, 


U0) = (2) = "Fh @- x) 


is a strictly increasing function of 0, By symmetry, L(0) is a strictly 
decreasing function of 0 in X, < 0 x а. We conclude that an MLE has.to 
be one of the observations, 

In particular, let œ = 5 and л ~ 3, and Suppose that the observations, 
arranged in increasing order of magnitude, are 1, 2, 4. In this case the MLE 
can be shown to be б = 1, which Corresponds to the first-order statistic. If 
the sample values are 2, 3, 4, the third-order statistic is the MLE. f 
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Example 9. Let Xj, X», ---, X, bea sample from G(r, 1/8); 8 > 0 andr > 0 
are both unknown. The likelihood function is 


stor Й ft [ x,‘ exp(- BÉ x x, z 0, 


, otherwise. 


ЦВ, r5 Xy Xo тт, Xn) = | 
Тһеп 


„>к би a ies 3c 949: р log x, = eh 
de гарт ? = = 5 x; = 0, 


аца „шр Eos x; = 0. 


The first of the likelihood equations yields A(x, x», ..., х„) = 2/9; while the 


second gives 
r ШВЫ 
торт + $ log x AD 0, 


tog 00. saec 1 tops, 


which is to be solved for f. In this case, the likelihood equation is not easily 
solvable and it is necessary to resort to numerical methods, using tables for 
Г'(гу]Г(т). 


that is, 


Remark 2. We have seen that MLE's may not be unique, although fre- 
quently they are. Also, they are not necessarily unbiased even if a unique 
MLE exists, In terms of MSE, an MLE may be worthless. Moreover, MLE’s 
may not even exist. We have also seen that MLE’s are functions of sufficient 
statistics, This is a general result, which we now prove. 


Theorem 1. Let T be a sufficient statistic for the family of pdf's (pmf's) 
So (x), 0 € Ө. If an MLE of 6 exists, it is a function of T. 


Proof. Since T is sufficient, we can write 

fix) = Wx) gol T(x)), 
using the factorization criterion. Maximization of the likelihood function 
with respect to @ is therefore equivalent to maximization of g«(T(x)), which 
is a function of T alone. 
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Remark 3. If the likelihood equations exist and Т is sufficient, the) 55 
are given by 


8logg(T) _ "us o 
ài, 0 59] = 1,2, ..-, К, 


and every nonconstant solution of these equations is a function that de- 
pends on T alone. 


Remark 4. Theorem 1 does not say thatan MLE is itself a sufficient statis- 
tic, although this will usually be the case. Consider, for example, the case of 
sampling from U[0, 0 + 1], 0 € 2, Then 


Л, 0 < min x, x max x, < 0 +1, 
Жх) -lo otherwise, 


and it follows that (min X;, max X;) is jointly sufficient for б. Any value of 
6 satisfying 

max x; — 1 < 0 < min x; 
isan MLE. In particular, min,<,., X; is an MLE that is not sufficient. 
Theorem 2. Suppose that the regularity conditions of the FCR inequality 
are satisfied and 0 belongs to an open interval on the real line. If an estimate 


6 of 0 attains the FCR lower bound for the variance, the likelihood 
equation has a unique solution Ô that maximizes the likelihood. 


Proof. If б attains the FCR lower bound, we have (see Theorem 8.5.1) 
210/49. — KOLA) — 0] 


with probability 1, and the likelihood equation has a unique solution 0 = б. 
Let us write (0) —-[K(8)] 1. Then 
2 
2 x: A = A'(0) (6 — 0) — A(6), 
so that 


Flog f(X)| 2 
СА = - AO 


We need only to show that A(0) > 0. But by (8.5.22) 


дф = E, (2198/0 V. 


and the proof is complete. 
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Remark 5. In Theorem 2 we assumed the differentiability of 4(0) and the 
existence of the second-order partial derivative д? log f;/00^. If the condi- 
tions of Theorem 2 are satisfied, the most efficient estimate is necessarily 


„the MLE. It does not follow, however, that every MLE is most efficient. 


For example, in sampling from a normal populution, 8° = У(Х. Yn 
is the MLE of o^, but it is not most efficient. Since У(Х; — XY |0? is 


: x(n — 1), we see that var (6?) = An = 1) c*[r, which is not equal to the 


FCR lower bound, 26*/n, Note that 6? is not even an unbiased estimate. 


We next consider an important property of MLE's that is not a charac- 
teristic of unbiased estimates. 


Theorem 3. (Zehna [147]. Let (o: @єӨ} be a family of pdf's (pmf's), 
and let L(8) be the likelihood function. Suppose that Ө € #,,k > 1. Let 
h:O>A bea mapping of Ө onto A, where A is an interval in 2,(1 < p< К). 
If 6 isan MLE of 6, then h() is an MLE of h(0). 


Proof. For euch Re A, letus define 
Ө, = (0:0€0, №0) = A} 
and 
МО; х) = Hue L(0;x). 
Then M дейпей оп A is. called the likelihood function induced by A. 1t ó 


is any MLE of 6, then @ belongs to one and only one set, Ө; say. Since 
6 <j, А = Мб). Now 


м; x)= sup L(0;x) > 105%) 
teal 
and A maximizes M, since 
MÀ; x) < sup М(А; х) = sup Це; x) = LÊ; x), 

so that M(¥; x) = supe, M(A; x). It follows that Ais an MLE of WOJ, where 
p=). 1" { в 

Example 10. Let X ~ КІ, p), 0 < ps 1,and let р) = p(1 — p). We wish 
to find the MLE of A(p). Note that 4 = [0, 4]. The function h is not one- 


to-one. The MLE of p based on a sample of size m is $5... Sy sed 
Hence the MLE of parameter A(p) is IX) = Xd - X). 


Example 11. | Cónsider a random sample from G(1, B): Tt is required to find 
the MLE of in the following manner. ‘A Sample of size nis taken, and it is 
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known only that k, 0 < k < n, of these observations are « M, where M is 
a fixed positive number. 

Let p = P(X, М} = 1—e V^ so that — МА = log (1 — p) and 
В = Мов [1/(1 — p). Therefore the MLE of 315 M/log [1/0 — p)], where 
B is the MLE of p. To compute the MLE of р we have 

Цр; xis Xa 5 X4) = pl р)“, 
so that the MLE of p is p = k/n. Thus the MLE of B is 


M 
dvi -17(CE SN 


Finally we consider some important large-sample properties of MLE's. In. 


the following we assume that { fy, бє Ө} is a family of pdf's (pmf's), where 
Ө is an open interval on 2. The conditions listed below are stated when fois 
a pdf. Modifications for the case where f; is a pmf are obvious and will. be 
left to the reader. 


(i) 3 log f/00, 9" log f;/00", д° log f;/90" exist forall б &G and every x. Also; 
f: fi) gy = p, Sog КОО. = 0 ' forall gece. 


Nes 
T 2 
(ii) f de 24. dx=0 ^ forallüe6. 
со: 2 
(iii) ae f ho’ юв fix) dx <0 (огайб. 
ы ü 5d 


(iv) There exists a function H(x) such that for all 0 €. 
3 
|? 8 £9 | < н and: [Y HQ) fi) dx = M(0) < co. 


(v) There exists a function "£(0) that is positive and twice differentiable for 
every 0 € Ө and a function H(x) such that for all 0 


x [|0 7 pe | E но) апа {н 0 filx) dx < ©. 


үп 


Note that condition (у) is equivalent to condition (iv) with the, added 
qualification that 2(0) = 1. 


Ме state-the following results without proof. 
Theorem 4. (Cramér [18)). 


0, Conditions (i), (iii) and (iv) imply that, with probability approaching 
T$. L as 1 — со, the likelihood equation has.a consistent solution. .. 
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(b) Conditions (i) through (iv) imply that ‘a consistent solution 6, of 
the liklihood equation is asymptotically normal, that is, 
a) n(0, — 0) 5Z 
where Z is (0; 1), and 
ОЕ 9 log fX) ји 
ot m= [n {AE} | 


On occasions one encounters examples where the conditions of Theorem 
4 are not satisfied and yet a solution of the likelihood equation is consistent 
and asymptotically normal. 
Example 12 (Kulldorf [65]. ^ Let X~ A (0, 0), 6 > 0. Let Xy, Xo X, 
be л independent observations on X. The solution of the likelihood equation 
is 6, = ут, Хап. Also, ЕХ? = 0, var (X^) = 26°, and 


29: 
We note that 
and ! 
d du 5x? = no L 
Vn, - 0) = 293 75 — (0,289. 
However, 
3 2 
л - pto as 0 — 0 


and is not bounded in 0 < 0 < œ. Thus condition (iv) does not hold. 
The following theorem covers such cases also. 
Theorem 5 (Kulldorf [65]). 


(a) Conditions (i), (iii), and (v) imply that, with probability appoaching 
1 as n — co, the likelihood equation has a solution. 

(b) Conditions (i), (ii), (iii), and (v) imply that a consistent solution of 
the likelihood equation is asymptotically normal. í 


Proof of Theorems 4 and 5. For proofs we refer to Cramér [18], page 500, 
and Kulldorf [65]. я Е: 
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Remark 6. ‘It is'important to note that the results in Theorems 4 and 5 
establish the consistency, of some’ root: of the: likelihood equation but not 
necessarily that of the MLE when the likelihood equation has several 
roots. Huzurbazar [51] has shown that under certain conditions the like- 
lihood equation has at most one consistent solution and that the likelihood 
function has a relative maximum for such a solution. Since there may be 
several solutions for which the likelihood function has relative maxima, 
Cramér's and Huzurbazar's results still do not imply that a solution of 
the likelihood equation that makes. the likelihood function an absolute 
maximum is necessarily consistent. 

Wald [136] has shown that under certain conditions the MLE is strongly 
consistent. It is important to note that Wald does not make any differenti- 
“ability assumptions. 

In any event, if the MLE is a unique solution of the likelihood equation, 
we can use Theorems 4 and 5 to conclude that it is consistent and asympto- 
tically normal. Note that the asymiptotic variance is the same as the lower 
bound of the FCR inequality. 


Example 13. Consider Xj, X» ..., X, iid P(A) rv’s, Ae O = (0, со). The 
likelihood equation has a unique Жын, Ax, 2..5 Xn) = X, which max- 
imizes the likelihood function. We leave the reader to check that the 
conditions of Theorem 4 hold and that MLE X is consistent and asymp- 
totically normal with mean А and variance A/n, a result that is immediate 
otherwise. 


We leave the reader to check that in Example 12 conditions of Theorem 5 
are satisfied. 


PROBLEMS &7 


1. Let X, X» ©, X, be iid rv's with common m (pdf) f(x). Find an MLE for 
0 in each of the follow cases. 
} (а) fix) = 1/6, x = 1,2, ---, 0, 1 < 0 < 0, an integer. 
(b) f(x) = 4 erie", — oo < x < co. 
©) f) =e 0 S x < оо: 1 boni 
1)” fie) = Oax et > 0.— and æ known. 
(е) Sx) = 04 — x), 0 х 1,054. 
=) Si) = 0 — 6)- A ae -D/aG-0 0 <x < 1,2<0<1. 
P Find an MLE, if i it exists, in each of the following cases. 


(a) X ~ b(n, 0): both n and 0 are unknown, and one observation is available. 
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(b) X, Xo ++» X, are iid b(n, 0) гуз, and both n and б are unknown, 
(c) Xi Xa X, ~ B5 0), де. 1]. " 
(d) Xj Xn Xn ~ WO, 09, бє 2. 

(e) Xy X» +» X, is a sample from 


prtm- 150. rG-xp-Ro P =y 7009 40. 
EEX X Xa (0/0), 0 < 0 < со. 
(g X~ (0,0). i б : 
3. "Suppose that n observations are taken on an rv X with distribution (и, 1), 
but instead of recording all the observations one notes only whether or not’the 
observation is less than 0. If (X < 0} occurs т (< n) times, find the MLE of и. 
4. Let X, X4 =s X, be a random sample from pdf 

Лазо В) e prie te сюе e x eo осоо B» 0. 


(a) Find the MLE of (a, B). 

(b) Find the MLE of Pas (X; = 1). 

5, Let X, X; -- X, be:a sample from exponential density f(x) = 0e-'*, x = 0, 
0 > 0. Find the MLE of 0, and show that it is consistent and asymptotically 
normal. 


6; For Problem 8.6.5 find the MLE for (д, 09). 
7: For a sample of size 1 taken from (x, 02), show that no MLE of (и, g?) exists. 


8. For Problem 5.2.5 suppóse that we wish to estimate N on the basis of obser- 
vations Xj, X» «++, Хм. 

(a) Find the UMVUE of N. 

(b) Find the MLE of N. $ 

(c) Compare the MSE’s of the UMVUE and the MLE. 

9. Let Xj(—2h2, 5 sj = 12, . n) be independent гуз where 
Хи Muy 0), і = 1, 2, +++ 5. Find MLE’s. for ду Hey t» Hs and 02. Show | 
that the MLE for о? is not consistent as 7 — со (n fixed). 

(Neyman and Scott [86]) i 
10. Let (X, Y) have а bivariate normal distribution with parameters ш, д» 
д\?„ 0%, and р. Suppose that л observations are made on the pair (X, Y), and 
Non observations on X; that is, N — n observations on Y are missing. Find the 
MLE’s of д, д» 01%, 07^, and p. я (Anderson [1]) 
[Hint : If f(x, у; шь дь 012, 02, р) is the joint pdf of (X, Y) write 
Дх, у; pns дь T8 02%, р) = Л(х рь 012) Fix ОВ» 22°01 — 07), 

"where fi is the marginal (normal) pdf of X, and fyix is the conditional (normal) 
pdf of Y, given x with fhean ` Р 


меен) нн 
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and variance 0:1 — р?), Maximize the likelihood function first with respect to д, 
and g,? and then with respect to pa — (02/01) роо. and 0:1 — р?).] 
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In this section we consider the problem of point estimation in a general set- 
ting. This discussion falls under the heading of decision theory. 

Let { fy: 0€ Ө} be a family of pdf's (pmf's), and-X;, Xz, ..., X, be a sample 
of size n from this distribution. Once the sample point (x;, xo, ..., x,) is 
observed, the statistician takes an action on the basis of these data. Let us 
denote by Z the set of all actions or decisions open to the statistician. 


Definition 1. А decision function d is a statistic that takes values in .o/, that 
is, d isa Borel-measurable function that maps 2, into .o/. 


If X = x is observed, the statistician takes action d(x) e „х7. 


Example 1. Let of = (2; az}. Then any decision function d divides the 
space of values of (X, ---, Х,), namely, @,, intoa set C and its comple- 
ment С“, such that if x e C we take action ау, and if x € C^ action. a; is 
taken. This is the problem of testing of hypotheses, which we will discuss 
in Chapter 9. 


Example 2. Let «/ = Ө: In this case we face the problem of estimation. 


‘Another element of decision theory is.the specification of a loss function, 
which measures the loss incurred when we make a decision. 


Definition 2. Let sZ be an arbitrary space of actions. A nonnegative function 
L that maps Ө x .«/ into @ is called a loss function. 


The value L(0, a) is the loss to the statistician if he takes action a when 
0 is the true parameter value. If we use the decision function d(X), and L is 
the loss function and 0 the true parameter value, the loss is the rv L(0, A(X). 
(As always, we will assume that L is a Borel-measurable function for this to 
be true.) 

^ 
Definition 3. Let 2 be a class of decision functions that map Ф, into 5£, 
and let L be a loss function on Ө x sf. The function R defined on Ө x 2 by 


Ө] R(0, d) = E,L(0, 4Х)) 
is known as the risk function associated with d at 0. 


di, 
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Example 3. Let « = Ө € 2, L(0, а) = |0 — аў. Then 
R(0, d).— EgL(0, d(X)) = Ey {4Х) — 0}, 


which is just the MSE. If we restrict attention to estimates that are un- 
biased, the risk is just the variance of the estimate. 


( 


The basic problem of decision theory is the following: Given a loss func- 
tion L(0, a), a decision function d e 2, and the risk function R(0, d), what 
criteria should we use to choose one decision function in 2 over another? An 
ideal solution would be to choose d so that R(0, d) is minimum for all 0 € Ө. 
Unfortunately this is usually not possible. 


Definition 4. The principle of minimax is to choose d* є 2 so that 
Q) max R(0, d*) < max R(0, а) 
for all d in 2. Such a rule d*, if it exists, is called a minimax rule (decision 


function). 


If the problem is one of estimation, that is, if af = Ө, we call d* satisfying 
(2) a minimax estimate of б. 


Example 4. Let X ~ O(1, p, pe (1, 3) and of = (ay аг). Let the loss 
function be defined as follows. 


{ | a % 
p=t TEA 
рр= + 3 2 


The set. of. decision rules includes four functions: dj, dz, ds, d, defined by 
40) = 401) = а; 40) = а, 40) = as 40) = a» 21) = ai; and 
d(0) = 4(1) = az. The risk function takes the following values 


ү 7 А ,d) Min Мах R(p, d; 
і Кр, di) Rh 4) м (p d)s Ва ЫШ hid 


1 1 3 3 
2 1 T 1 1 
3 i i ? 
4 4 2 4 


Thus the minimax solution is d,(x) = 4; if x = 0, and = a; if x =1. 


The computation of minimax estimates is facilitated if we use the Bayes 
estimation method. So far, we have considered 6 as a fixed constant and f(x) 


+ 
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has represented the pdf (pmf) of the rv X. In Bayesian estimation we treat 3 
0 as a random variable distributed according to pdf (pmf) z(0) on Ө. Also, 
z is called the a priori distribution. Now fede represents. the conditional 
probability density (or mass) function of rv X, given that 0 € Ө is held fixed. 
Since z is the distribution of 0, it follows that the Теш density (pmf) of 0 
and X is given by 
@) fx, 0) = 2(0) 70х|0). 
In this framework R(0, d) is the conditional average loss, E (L(0, d(X)) | 0}, 
given that 0 is held fixed. (Note that we.are using the same symbol to denote 
the rv 0 and a value assumed by it.) 
Definition 5. The Bayes risk of a decision function d is defined by 
(4) R(x, d) = E,R(6, d). 

If 0 is a continuous rv and X is of the continuous type, then 


К(л, d) = j R(0, dyz(0) dà 
5 f f L(0, dx) f(x|0) z(0) dx dà 


6) =f f 100, do) r0) dx d. 
If 0 is discrete with pmf z and X is of the discrete type, then " * 
(6) R, 4) = у L(0, &х)) f(x, 0). 


Similar ео may be written in the other two cases. 
Definition 6. A decision function d* is known as a Bayes rule ол, 
if it minimizes the Bayes risk, that i is, if 
(7) R(x, 4% = inf R(z, 4). 
d 


Definition 7. The conditional distribution of rv 0, given X = x, is called 
the a posteriori probability distribution of б, given the sample. 


Let the joint pdf (pmf) be expressed in the form 
@) Дх, 0) = gx) Olx), 
where g denotes the joint marginal density (pmf) of X. The a priori pdf 
(pmf) z(0) gives the distribution of 0. before the sample is taken, and the 


a posteriori pdf (pmf) A(0|x) gives the distribution of @ after sampling. In 
‚ terms of A(0|x) we may write 
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(9) R(x, d) = few { f L(0, dx) A0|x) а) dx 
or 
(10) R(x, d) = x g(x) {2 L(0, d(x)) h(6|x)}; 


depending on whether f and z are both continuous or both discrete. 
Similar expressions may be written if only one of fand z is discrete. 


Theorem 1. Consider the problem of estimation of a parameter 0c0 c 2 
with respect to a quadratic loss function L(0, d) - (0 – 4). A Bayes solution 
is given by Y 
(1) d)» E(0X = x) 

(d(x) defined by (11) is called the Bayes estimate). 

Proof. |n the continuous case, if л is the prior pdf of 0, then 


ка, d) = [09 (fto — «07 о а} dx, 


where g is the marginal pdf of X, and h is the conditional pdf of 0, given 
x. The Bayes rule is a function d that minimizes R(x, d). Minimization of 
R(x, d) is the same as minimization of 


[о - aco мор 40, 
which is minimum if and only if 


d(x) = E(0|x). 


The proof for the remaining cases is similar. 


Remark 1. The argument used in Theorem 1 shows that a Bayes estimate 
is one which minimizes E{L(0, d(X))|X}. Theorem 1 is a special case which 
says that if L(0, d(X)) = [0 — АХ) the function 


ах) = [омор ae 


1 & 101 2 
is the Bayes estimate for 0 with respect to z, the a priori distribution on 8. 


Example 5. Let X~ b(n, p) and L(p, d(x)) ^p — door. Let т(р) = 1 for 
0 <p « 1 be the a priori pdf of p. Then jut @ baa 


Ora- 


Ж: ыл; 


(Br à - p" ae 
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Tt follows that 


Е{р|х} =f h{ p|x} dp 


_х+1 
х n+2° 
Hence the Bayes estimate is 
d*(X) = х + i 
The Bayes risk is 


AG. d*) = | т(р) Stax) = pP fed Ф 
4 o a +=») 0} a 


„МЮ = = p) + (1 — 2p] dp 


"(m Ж, ai 


"ow ey 


Example 6... Let I Ж(и, 1), and let the a priori pdf of u be (0, 1). 
Also, let Ди, d) = [и — d(X)P. Then 


h(u|x) = Го и) _ mu) fex) 


g(x) (x) 
where 


g(x) = јл y) du 
= age (= YEX) 


It follows that _ 


WE ig Ew Vig o{- 23 (и- гү, 


and the Bayes estimate is 


E - eile "m x TY 
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The Bayes risk is 
кт, d*) = | ш) f fd") — uf Sealy) dx dy 
= fo alah att 
= nter aw ds 


se КЙ 
ntl 


The quadratic loss function used in Theorem 1 is but one example of a loss 
function in frequent use. Some other loss functions that may be used are 


o = ax i в— ахуү? 
|6 — ах), = lo — ap ana (A) я 


Example 7. Let Xj, X; ..., X, be iid (4, 0°) tv's. It is required to find 
a Bayes estimate of д of the form d(x;, -.., х„) = 4%), where x = 25 x;/n, 
using the loss function L(y, d) = [и — d(x)|. From the argument used in 
the proof of Theorem 1 (or by Remark 1), the Bayes estimate is one that 
minimizes the integral f |u — d(x)|h(|%) dp. This will be the case if we choose 
d to be the median of the conditional distribution (see Problem 3.2.5). 

Let the a priori distribution of и be (0, c?). Since X (д, o*[n), we 
have ; 


fæ Ш = ЕА oS exp 


{- @- oy - X py 
2c 20° 
Writing 
G-p =(@— 0 + 0 — и = (s = OF -Xs — буш — 0) + (и — бу, 


we see that'the exponent in f(x, и) is 


- pu- (h + 4) - 26000 + Be - oy, 


It follows that the joint pdf of и and X is bivariate normal with means 0, 0, 
variances c^, c^ + (o2/n), and correlation coefficient c//[c^ + (о?/п)). The 
marginal of X is J/(0, 2+ (c?|n)), and the conditional distribution of и, given 
X, is normal with mean , 
Kotin) + ят? 

c + (e*[n) 


= 9 = 


p c: aed 
М? + (c?]n) v & + (o°) 
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and variance 


2L * PN | M. п 
ү + (ein) cn) 
(see the proof of Theorem 5.4.1). The Bayes estimate is therefore the median 


of this conditional distribution, and since the distribution is symmetric about 
the mean, 


d*(z) = (c^ |n) + яг? 
c + (oin) 
is the Bayes estimate. 


The following theorem provides a method for determining minimax esti- 
mates. | 


Theorem 2. Let (fj: 0 € Ө} be a family of pdf's (pmf's), and suppose 

an estimate d* of 0 is a Bayes estimate corresponding to an a priori distribu 
Чоп л on Ө. If the risk function R(0,d*) is constant ‘on Ө, then 4* is 4 
minimax estimate for 0: 

Proof. The proof is left as an exercise, , 
Example 8 (Hodges and Lehmann [48)). Let X ~ ey р),0<р<1. We 


seek a minimax estimate of p of the form aX + 8, using the squared err 
loss function. We have 


RO, d) = Вдох + #— р}! = Ela — np) + В + (an — DP} 
= (ап — 1) — a?n]p? + lan + 26(an — 1)]р + & 


which is a quadratic equation in р. To find « and ĝ such that R(p, d) is 
constant for all р є Ө, we set the coefficients of p? and p equal to 0 to get 


(an – 1? – ал = 0 апі a’n + 2B(an — 1) = 0 Й 
It follows that 


к; n(l* їп Уп(уп= 1) d 
and. o. 
ES CORRI MINE Iu Re ii 
yale va) = tars 
\Since ор 1,? ‘we discard-the second set of roots for both a and f, ап 


then the estimate is of the form 
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Уп(1+ улп) 201+ уп) 
It remains to show that d* is Bayes against some a priori pdf z. 

Consider the a priori pdf 


л(р) = [В@', BT! p" 11 – PP OS ps1, а, 8 > 0. { 
The a posteriori pdf of p, given x, is expressed by | 


d*(X) = 


Жох) = SUPE S D AUTE 
[d cups py Y Tap 


It follows that 


f, gta - = dp 


Сара T n—x + 6’) 


E: 

(Bye. f m 112 2 ру гар B(x + а", n—x4 P) 
e tT u. 
п+а+ 8° 


which is the Bayes estimate for a squared error loss. For this to be of the 
form d*, we must have 


1 = 1 Е + Фаг 
Утуп) яна sp WM + ул) nea tp 
giving a’ = В' 9 3/1 ]2. Tt follows that the estimate d*(X) is minimax with 
constant risk t 


R(p, d*) = forall, . pelo, 1]. 
‚ 


1 
AI VR) 
Note that the UMVUE (which is also the MLE) is d(X) = X/n with risk 
R(p, d)  p(1 — p)/n: Comparing the two risks (Figs. 1 and 2), we see that 


KD < POET y. 2 —I sd, and only if hs - 1] > Xi. 
so that 
: R(p, d*) < R(p, d). . 
in the interval (4 — a, } + a,), where n >0asn—> 9. Moreover, 
sup R(p, d) ln soni 20+ 
ырдо)" 10401 + vay) 2208 


Clearly we would prefer the minimax estimate if' is small, and gona 
prefer the UMVUE because of its simplicity if п is large. 


=l asn- o. 
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Кр, d) = p-p) 


== 


93 1 
0 .07 + 


Fig. 1 Comparison of R(p, d) and R(p, d*) for n —1; 


о 
> 
ч 


1 
7 


Fig.2 Comparison of R(p, d) R(p. d*) for n = 9. 
Example 9 (Hodges and Lehmann [48]. А lot contains №. elements, of 


which D are defective. A random sample of size. n produces |X defectives. 
We wish to estimate D. Clearly 


P {X = k} = (2357 


D 2 xn E um D) 
ЕХ = п =, апа = — —/, 
3 N "p NAN — 1) 
Proceeding as in Example 8, we find a linear function of X with constant 
risk. Indeed, Ej(aX + 8 — D) = f^ when: 


A N LUN Fan 
EN Baale $F) 


BAYES AND MINIMAX ESTIMATION i 397 
We show that aX + В is the Bayes estimate corresponding to a priori pmf 
2m -cll( Nga — pe p30 — р)! 
P{D = d) «(02)» (1 — р)“ р” (1— Р) a, 


where a, b > Oand c = Г(а + b)|T(a)/ (b). First note that Zio P {D= а} 
= 1 so that 


м Ач I(a- b) Га + d)I(N € b -d) _ү 


4 d / T(ayr (b) T(N * a + b) 
The Bayes estimate is given by is 
say coca I DEN 0 
"£(D Pd) re + ore +ь- д ; 


A little simplification, writing d = (d — a) *a and using 
(96 :9(2)-G- 909) 


Y (s rasa EDTN 5-4 
d*(k) = Sh te La 
= ( F ) rd + a) P(N + b = 4) 
atbtN, a(N — n). 
a+b+n atb+n 


yields 


т^ 


Now putting 
i ac b+N, did ватан) 


x= atban atb+n 
and solving for a and b, we get 
* zd М ап = 
онеро i ea 


Since a > 0, 8 > 0, and since b > 0, N > ап + В. Moreover, a > 1 if 
N >n.+ lo If Nonae d, the result-is obtained ifiwe give Da binomial 
distribution with parameter р = 1. If N = n, the result is immediate. 


PROBLEMS. 8.8 


1. It rains quite often in Bowling Green; Ohio. On a rainy day a, teacher has 
essentially three choices: (1):to take an umbrella and face the possible prospect of 
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carrying it around in the sunshine; (2) to leave the umbrella at home and perhaps 
get drenched; or (3) to just give up the lecture and stay at home. Let O= (6,, 6), 
where б, corresponds to rain, and б„ to no rain. Let of = (1, az аз), where a; 
corresponds to the choice i, i = 1, 2, 3. Suppose that the following table gives the 
losses for the decision problem: 


were %_ 
а, 1 2 
а; 4 0 
аз 5 5 


The teacher has to make a decision on the basis of a weather report that depends 
on @ as follows. 


КОШ ER Be 
_W,(rain) 44 2 
И (по rain) 3/5 |+ М8 

7 | À ӯ 


Find the ‘minimax rule to help the teacher reach a decision. 


2. Let Xi, X; =, X, be à random sample from P(A). For estimating A, using the 
quadratic error loss function, an a priori distribution over Ө, given’ by pdf 


п(4) =е if A>O, 
—0 otherwise, 
is used. 


(a) Find the Bayes estimate for А. 


(b) If it is required to estimate 9(2).— e-? with the same loss function and same 
a priori pdf, find the Bayes estimate for (A). 


3. Let X,, Xn», X, be a sample from КІ, 0). Consider the class of decision rules 
d of the form d(x,, x,,--:,x,) = n7! X7., x, + a, where a is a constant to be deter- 
mined. Find а according to the minimax principle, using the loss function (0 —d)*, 
where d is an estimate for 0. ie. 


4. Let d* be a minimax estimate for OY with respect to the squared error loss 
function. Show that ad*'+ b (a,b constants) is a minimax estimate for a 90) + b. 
і "ET 


5. Let X~ b(n, б), and suppose that the a priori pdf of 0 is U(0,1). Find the Bayes 
estimate of 0, using loss function L(0, d) = (0 — d)2/[a(1 — 6)). Find a minimax 
estimate for б. 


6. In Example 5 find the Bayes estimate for р?. 


Ue Let Xy ХХ, be a random sample from G(1,I/ X). To estimate A, let the à 
t priori pdf on A be z(4) = e-4, A>0, and let the: loss function be squared. error. 
Find the Bayes estimate of А. 
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8.9 MINIMAL SUFFICIENT STATISTIC 


In Section 3 we alluded to the fact that a given family of probability, dis- 
tributions that admits a nontrivial sufficient statistic usually admits several 
sufficient statistics. Clearly we would like to be able to choose the sufficient 
statistic that results in the greatest reduction of data collection. In' this 
section-we study the notion of a minimal sufficient statistic. For this purpose 
itis convenient to introduce the notion of à sufficient partition. The reader 
will recall that a partition of a space X is just a collection of disjoint sets E, 
such that X, E, = X. Any statistic T(Xi, Х»,--, X,) induces a partition of 
the space of values of (Xy, X», ©: X,), that is, T induces a covering of X by 
a family 9( of disjoint sets 4, — fr Xo 7 Xn) EL! TG X» +, хун; 
where t belongs to the range of Т. The sets А, are called partition sets. 
Conversely, given a partition, any assignment of a number to each set so 
that no two partition sets have the same number assigned defines a statistic. 
Clearly this function is not, in general, unique. 


Definition 1. Let (F5: 0€ 0) bea family'of df's, and X = (Xj, Xo, +++, Xp) 
bea sample from Fy. Let 9 Бе a partition of the sample space induced by 
a statistic T = T(X,, Xo, =, Xp). We say that # = {A,: tis in the range 
of T) is a sufficient. partition for 0. (or the family {Fy: 0є0)) if the 
conditional distribution of X, given Т = t, does not depend on 0 for any 
Ay provided that the conditional probability is well defined. j 


Example 1. Let Xj, Xz, +, X, be iid b(1, p) rv's; The sample space of values 
of (Xi, Xz «++, X,) is the set of n-tuples (x1, X25 ^7, х,), where each x; = 0 or 
= 1 and consists of 2” points. Let T(X1, X; ^, X,) — Li X» and consider 
the partition W = (4o Ai, +, An} where xe A; if and only if DY x= J, 
0 < ј < п. Each А, contains () sample points. The conditional’ probability 
nibns tne fy Р » ^ 
руха) = (уо хел, 

and we see that 9( їз а sufficient partition. 
Example 2.. Let Xy X» + X, be iid U[0, 6] rv's. Consider. the statistic 
T(X) = max;gs, Хь The space of values of Ху, Xo, 7, X, is the set of 
points (x:0 €x; б, = 12,77, n). T induces a partition 9 on this. set. 
The sets of this partition are А, — ((xi» Xe» 7s п): max(x, «x, = 0 
te [0, 6]. 1 

We have 


fld = Ж if xe A, 
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where 7(t) is the pdf of T. We have І 
10" 1 ? { 
ж) = xm tue. X6A. | 

It follows that X= {A,} defines a sufficient partition. 


Remark 1. Clearly a. sufficient statistic T for a family of df's {Fy: 0 €68} 
induces a sufficient. partition; and, conversely, given a sufficient partition, - 
we can define a sufficient statistic (not necessarily uniquely) for the family. | 


Remark 2. Two statistics Tj, T; that define the same partition must be in | 
one-to-one correspondence, that is, there exists a function h such that 

T, = ҚТ) with a unique inverse, T; = A (Tj). It follows that if T, is - 
sufficient every one-to-one function of Т, is also sufficient. | 


Let 91, Az be two partitions of a space X. We say that 9f, is a subpartition — 
of 9(; if every partition set in (; is a union of sets of 9(. We sometimes say 
» also that Xy is finer than 95 (Az is coarser than 901) or that 9(; is a reduction 
of Mı; In this case, a statistic T that defines Az must be a function of any 
Statistic T; that defines 9(,.. Clearly, this function need not have a unique 
inverse unless the two partitions have exactly the same partition sets. 
Given a family of distributions {Fy: 0 € Ө} for which a sufficient partition 
exists, we seek to find a sufficient partition 9( that is as coarse as possible, 
that is, any reduction of 9 leads to a partition that is not sufficient! 


Definition 2. А partition 9( is said to be minimal sufficient if 
(i) Wis a sufficient partition, and 
(i) if € is any sufficient partition, €. is a subpartition of 9. 

The question of the existence of the minimal partition was settled by 
Lehmann and Scheffé [69] and, in general, involves measure-theoretic con- 
siderations. However, in the cases that we consider where the sample space 
is either discrete or a finite-dimensional Euclidean, space, and the family of 
distributions of X is defined by a family of pdf's (pmf's) (fj, 0 € 6), such - 
difficulties do not arise. The construction may be described as follows: ’ 

Two points x and y in the sample space are said to be equivalent, and we 
write x ~ y, if and only if the ratio fKx)|fKy) does not depend on 0 (when- 
ever the ratio is defined). We leave the reader to check that “~” is an 
equivalence relation (that is, it is reflexive, symmetric, and transitive) and 


hence “~” defines a partition of the sample space. This partition defines the 
minimal sufficient partition. 
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Example 3. Consider again Example 1. Then 


pios = р®®(1 pp E 


and this ratio is independent of p if and only if 
» х; = X Jis 
1 1 


so that x ~ y if and only if ух; = Xj; у. It follows that the. partition 
A = (Ao 4), © Ay}, where xe A, if and only if Xi x; = j, introduced 
in Example 1, is minimal sufficient. 


A rigorous proof of the above assertion is beyond the scope of this book. 
The basic ideas are outlined in the following theorem. 


Theorem 1. The relation “ ~ " defined above induces a minimal sufficient 
partition, " 


Proof. If T is a sufficient statistic, we have to show that x ~ y whenever 
T(x) = Т(у). This will imply that every set of the minimal sufficient parti- 
tion is a union of sets of the form 4, = {Т = t}, proving condition (ii) of 
Definition 2. 

Sufficiency of T means that, whenever x € Ap then 


híixIiT-2t 2) if xeA 
KU SA ETAO \ 
is independent of 0. It follows that, if both x and y є A, then 


Jix| D... AX). 
fdy|) №) 
is independent of 0, and hence x ~ y. 
To prove the sufficiency of the minimal sufficient partition 9, let T, be an 
гу that induces 9(. Then Т; takes on distinct values over distinct sets of 9, 
but remains constant on the same set. If x € {Тү = tı}, then 


\ у KR. 
а) |Т, та PAT, =t} 
Now 


B(n-u)- | Муй or mÍ, 4 


(Тус) 
Тур) 
depending on whether the joint distribution of X is absolutely continuous or 
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discrete, Since fa(x)/fa(y) is independent of 0 whenever x ~ у, it follows that 
the ratio on the right-hand side of (1) does not depend on 0. Thus T; is 
sufficient. 


Definition 3. A statistic that induces the minimal sufficient partition is 
called a minimal sufficient statistic. 


In view of Remark 2, a minimal sufficient statistic is not unique; every 
one-to-one function of a ті... tal sufficient statistic is itself minimal sufficient. 
In view of Theorem 1 a minimal sufficient statistic is à function of every 
sufficient statistic. We caution once again that not every function of a sufficient 
statistic is sufficient. It was shown by Lehmann and Scheffé [69] that, if ТХ) 
is sufficient for 0, бє Ө, then ТХ) is sufficient if and only if there exist a 
Borel set B with РВ = 1 for all 0 € Ө and a Borel-measurable function g such 
that Тү(х) = g(T2(x)) for all x e B. This result can sometimes be used to test 
the sufficiency of a statistic. The following example is motivated by Harris 
and Soms [47]. 


Example 3, Let Xi, Xo, =", X, be a sample from „Ж (д, 1). Then the statistic 
T(X) = ; X;is easily dioi to ge sufficient —indeed, minimal sufficient— _ 
for д. We. Es that T(X) = 62 is not sufficient for д. To establish 
the sufficiency of T, we have to ООУ ‘the existence of a Borel set B with 
РВ) = 1 and a Borel-measurable function g such that T(x) = 9(T\(x)) for 

_ xe B, that is, à 


ECT 
ixi ех х;), xe B. 


Differentiating with respect to x; (i = 1, 2, ---, п), we get 
OT) = 4, = 28. 9h i212,-—n хєВ. 


It follows that g must satisfy 


n2228 $x, = 228 (Ту, 


hence 

j пТібх) = [e(TiG)s xe B, 

where we have used the fact that T,(0) = 7(0) = 0. Thus we have that such 
a g exists if and only if 


n Pl х = È x),  xeB, 
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and therefore if and only if all the components of x are equal, or in the 
trivial case when n = 1. It follows that T, is not sufficient. 


PROBLEMS 8.9 


1. Let Ху, Xa --, X, be a random sample froma population with law (X). Find 
a minimal sufficient statistic in each of the following cases. i 
(a) X~ P). 

(b) X ~ UIO, 6]. 

(с) X~ NB(l; р). 

(d) X~ Py, where Ру(Х = k} = 1/N if k = 1, 2, = N, and = 0 otherwise. 

(e) Хо My, 0°). 

(f) X~ Gla, B). 

( X^ Bla, B). 

(b) X f(x), where f(x) = (2/00 — х), 0 < x < 0. 

2. Let X, X; be a sample of size 2 from P(A). Show that the statistic X, + aX, 
where & > 1 18 ап integer, is not sufficient for A. 


3. Let Xj, X, =, X, be a sample from the pdf 


LE) 1 У 
ant if x>0 5.9 

0 if x<0 
Show that 217. X; is a minimal sufficient statistic for 0,but 77-1 X; is not sufficient. 
4. Let X, X, --, X, be a sample from (0, 0°). Show that 71 X7 is a minimal 
sufficient statistic but- 7.1 X; is not sufficient for o?. 
5. Let Xy, X»; X, bea sample from pdf fa, (x) = B e-?9-? if x > a, and = 0 
if х'< о. Finda minimal sufficient statistic for (а, 8). 


» 


CHAPTER 9 


Neyman-Pearson Theory 
of Testing of Hypotheses 


9.1 INTRODUCTION 


Let Xy, X», +++, X, be a random sample from a population distribution Fy, 
0&0, where the functional form of Fy is known except, perhaps, for the 
parámeter 0. Thus, for example, the Xs тау be a random sample from 
MW (0, 1), where бє 4? is not known. In many practical problems the experi- 
menter is interested in testing the validity of an assertion about the unknown 
рагатеѓег:0. For example, іп a coin-tossing experiment it is of interest to 
test, in some sense, whether the (unknown) probability of heads p equals a 
given number ру, 0  « py < 1. Similarly, it is of interest to check the claim 
of a car manufacturer about the average mileage per gallon of gasoline 
achieved by a particular model. A problem of this type is usually referred to 
as a problem of testing of hypotheses and is the subject of discussion in this 
chapter. We will develop the fundamentals of Neyman-Pearson theory. In 
Section 2 we introduce the various concepts involved. In Section 3 the funda- 
mental" Neyman-Pearson lemma is proved, and Sections 4 and 5 deal with 
some basic results in the testing of composite hypotheses, 


9.2. SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING 


In Chapter 8 we discussed the problem of point estimation in sampling from 
a population whose distribution is known except for a finite number of un- 
known parameters. Here we consider another important problem in statistical 
inference, the testing of statistical hypotheses. We begin by considering the 
following examples, 
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Example 1, In coin-tossing experiments one frequently assumes that the 
coin is fair, that is, the probability of getting heads or tails is the same: 1. 
How does one test whether the coin is fair (unbiased) or loaded (biased)? If 
one is guided by intuition, a reasonable procedure would be to toss the coin 
n times say, and count the number of heads. If the proportion of heads 
observed does not deviate "too müch" from p — 1, one would tend to 
conclude that the coin is fair. 


Example 2. It is usual for manufactures to make quantitative assertions 
about their products. For example, a manufacturer of 12 volt batteries may 
claim that, a certain brand of his batteries lasts for № hours. How does one 
go about checking the truth of this assertion? A reasonable procedure 
suggests itself: Take a random sample of n batteries of the brand in ques- 
tion, and note their length of life under more or less identical conditions. If 
the average length of life is “much smaller" than N, one would tend to 
doubt the manufacturer's claim, 


To fix ideas, let us define formally the concepts involved. As usual, X — 
(Xy, Xo, =, Х„) and let X ~ fy, 8€ Ө © Ry It will be assumed that the 
functional form of f; is known except for the parameter 0. Also, we assume 
that @ contains at least two points. 


Definition 1. А parametric hypothesis is an assertion about the unknown 
parameter 0. It is usually referred to as the null hypothesis, Ho: 0 €6, c Ө. 
The statement Н,:0є0; = Ө — Op is usually referred to as the alternative 
hypothesis. : \ 


Definition 2. If 6, (Ө!) contains only one point, we say that 6,(0,) is simple; 
otherwise, composite. Thus, if a hypothesis is simple, the probability 
distribution of X is completely specified under the hypothesis. 1 


Example 3. Let X ~ N (u, o°). If both и and o are unknown, Ө = 
(qu, 0°): — co < д < оо, a” > 0). The hypothesis Ho : и < дь 0° > 0, 
where до is a known constant, is a composite null hypothesis. The alternative 
hypothesis is H, : и> uo a? > 0, which is also composite. Similarly, the 
null hypothesis и = д, 02 > 0 is also composite. 

10? = 02 is known, the hypothesis Hy: и = yy is a simple hypothesis. 
Example 4. Let Xj Ж, =. X, be iid b(l, p) гуѕ. Some hypotheses of 
interest are р = 4, p < 4,p > 1 or, quite generally, p = po, p. < Po 
P = po, where ру is a known number, 0 < po < 1. E Р 


The problem of testing of hypotheses may be described аз follows: Given 
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the sample point ae (ху, хо» 77 х,), find a decision rule (function) that 
will lead to a decision to accept or reject the null hypothesis. In other 
words, partition the space 2, into two disjoint sets C and C° such that, if 
xe С, we reject Hy: @ € 0, (and accept Hj), and if x e C^, we accept Hy that 
X ~ fq, є Op. 


Definition 3. Let X ~ /, 0€ O. A subset C of 2, such that, if x e C, then 
Но is rejected (with probability 1) is called the critical region (set): 
C= (xe 2,: Ho is rejected if x e С}. 


There are two types of errors that can be made if one uses such a pro- 
cedure, One may reject Hy when in fact it is true, called a type Г error, or 
accept Hy when it is false, called a type II error. 


True 
| 
| Ho X Hi | 
Ho Correct Type II error 
Accept 
9 H; | Type I error Correct 


If C is the critical region of a rule, PoC, дє Qo, is a probability of type I 
error, and P,C*, 0€ 6;, is a probability of type II error. Ideally one would 
like to find а critical region for which both these probabilities are 0. This 
will be the case if we can find a subset 5 © 2, such that P,S = 1 for every 
@€@ and P,S = 0 for every 8€ Өү. Unfortunately situations such as this 
do not arise in practice, although they are conceivable. For example, let 
X~ «(1, 0) under Hp and. X ~ P(0) under Н... Usually, if a critical region 
‘is such that the probability of type I error is 0, it will be of the form “always 
` accept Ho” and the probability of type II error will then be 1. 

The procedure used in practice is to limit the probability of type I error 
to some preassigned level a (usually .01 or .05) that is small and to minimize 
the probability of type II error. To reformulate our problem in terms of this 
requirement, let us formalize these notions. , 


Definition 4. Every Borel-measurable mapping o of 2,.— [0, 1] is known 
asa test function. 


Some simple examples of test functions are g(x) = 1 for all x € 2,, ф(х) 
= 0 for allxe@,, or р(х) = a, 0x a x 1, forall xe 2,. In fact, Defini- 
tion 4 includes Definition 3 in the sense that, whenever is the indicator 
function of some Borel subset A of @,, A is called the critical region (of the 
testy). -— E 
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Definition 5. The mapping ¢ is said to be a test of hypothesis Ну: 0 € 8o 
against the alternatives Hj: 0 є Ө; with error probability a (also called 
level of significance or, simply, level) if 

(1) EQXX)xa —forall8€6,. 


We shall say, in short, that o is a test for the problem (a, Qo, Ө). 


Let us write 6,(0) = Ев (X). Our object, in practice, will be to seek a test 


9 for a given a, 0x a < 1, such that 
2 1 
Q) ‚3Чр B0) Sa 


The left-hand side of (2) is usually known as the size of the test 9. Condition 
(1) therefore restritts attention to tests whose size does not exceed a given 
level of significance a. 

The following interpretation may be given to all tests y satisfying 
B,(8) < а for all 6€@ 9. To every xe 2, we assign a number ф(х), 
0 < ф(х) < 1, which is the probability of rejecting. Hy that X ~ fo, 0 € Op, if 
X is observed. The restriction §,(0) < a for 0 € 8; then says that, if Hp were 
true, ø rejects it'with a probability < a. We will call such a test a randomized 
test function. If (x) = I,(x), р will be called a nonrandomized test. If x є A, 
we reject Ho with probability 1; and if x ¢ A. this probability is 0. Needless to 
say, Ac %,,. 

We next turn our attention to the type II error. 


Definition 6. Let ф be a test function for the problem (a, Oo, O1). For every 
0 € Ө define ' 


(3) 8,08) = Евр(Х) = Р, {Reject Hy}. 


As a function оѓ 6. 8,(8) is called the power function of the test p. For 
any 6 € 6,, 8,(8) is called the power of o against the alternative 0. 


In view of Definitions 5 and- 6 the problem of testing of hypotheses 
may now be reformulated. Let X ~ fa 0€ Ө © 4,, 0 = Oy + Ө,. Also, 
let 0 < a < 1 be given. Given a sample point x, find a test g(x) such that 
B,(8) x a for 0€ Oo, and 6,(0) is maximum fot бє Ө,. 


Definition 7. Let Ф, be the class of all tests for the problem (a, Qo, 0). 
A test po € Ф, is said to be a most powerful (MP) test against an alternative 
0є0; if 4 


(4) Bel) > 8,8) (огай ped,. 


If 6, contains only one point, this definition suffices. If, on the other hand, 
Өү contains at least two points, as will usually be the case, we will have an 


_ MP test corresponding to each 6 e Ө). 


-— 
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Definition 8. А test po є Ф, for the problem (a, Oo, Өз) is said to be а uni- 
formly most powerful (UMP) test if 


(5) Вие) > B8) for all реф, uniformly in 8 € 61. 


Thus, if Ө, and Ө, are both composite, the problem is to finda UMP test 

g for the problem (a, Oo Ө,). We will see that UMP tests very frequently 

‚ do not exist, and we will have to place further restrictions on the class of all 
tests, D,. 

Note that, if фу, y» are two tests and 4 is a real number, 0 < А < 1, then 

Api + (1 — A) qa is also a test function, and it follows that the class of all 


test functions Ø, is convex. 


Example 5. Let X, X», X, be iid (и, 1) rv's, where ш is unknown 
but it is known that we Ө — (uo p} to < ш. Let Ho: X; Ж (де 1), 
Hi: X; ~ N (uy 1). Both Но and H; are simple hypotheses. Intuitively, one 
would accept Hp if the sample mean X is "closer" to ш than to д; that is 
to say, one would reject Ho if X > k, and accept Н otherwise. The constant 
k is determined from the level requirements. Note that, under Ho, X ~ 4 
(to, 1/п), and, under Hi, X ~ (uy 1[п). Given 0 < а « 1, we have 

‘ е _ p(X = Ho, k — ш 

Pw {X > k} = РЕ > ту) 

= P(Type I error} = a, 


so that K = ду + z,/4/n. The test, therefore, is (Fig. 1) 


" Lf} я> po + 21у, 
є; 9G) zo otherwise. 


xi 


Ho Ho + за Ini 
Fig. 1 ` 


X is known as a test statistic, and the test o is nonrandomized with critical 
region C = (x: X > д + Zels n}. Note that in this case the continuity 
of X (that is, the absolute continuity of the df of X) allows us to achieve 
any size a, 0<a <1. 

The power of the test at и is given by 


x 
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2, 
E, ФК) = Pa [2> + PI 
Prop д Ad I ) 
= PU > s fy) V n t Za 
: =P {Z > 2, — уп (ш – Шш) 
where Z ~ N (0, 1). In particular, Ep Ф(Х) > а since ш > ш. The prob- 
ability of type II error is given by 
P{Type II ema — E, AX) 
L Em < za - X - i). 
Example 6. Let Xj, X» Xs, X, X; be a sample from b(l, р), where 
p is unknown and 0 <p < l. Consider the simple null hypothesis 
Ну: X; ~ b(1, $), that is, under Нур = 1: Then Hy: X; ~ М1, p, p # 1/2. 
A reasonable procedure would be to compute the average number of l's, 
namely, X = У? X;/5, and to accept Ho if |X — 4 < c, where с is to be 
determined. Let a = .10. Then we would like to choose ¢ such that the size 
of our test is a, that is, 


10 = Pint — 3| >с}; 
or 

90 = Pyia{—5e< È = +550) 
(6) A = Piani- beh -ise, 


where k = 5c. Now Ei X; ~ b(5, +) under Ho, so that the mn of, 
25 X; — 4 is given in the following table. ў 


A 5 
È Xi Z x Pyant2 X = X х) 
0 —2.5 .03125 
1 —1.5 15625 
2 —..5 .31250 
3 25 .31250 TM 
4 1.5 D .15625 и? 
5 2.5 Y .03125 


* Note that we cannot choose any k to satisfy (6) exactly. It is clear that we 
have to reject Hy when К = +2.5, that is, when we observe УХ; = 0 or 5. 
The resulting size if we use this test isa = 03125 + .03125 = .0625 < .10. 
А second procedure would be to reject Ho ifk = #15 or +2,5 (ХХ, = . 
0, 1, 4, 5), in which case the resulting size is а = .0625 + 2(.15625) =.375, 

gach is о-ну larger than .10. A third alternative, if we insist on 
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achieving a = .10, is to randomize on the boundary. Instead of accepting 
or rejecting Но with probability 1 when X X, = l'or 4, we reject Ho with 
probability 7 where 


5 
10 = Pii (X X; = 0 or 5} FTP lÈ X= 1 or 4). 
Va Ў 


Thus } | ; i 
Do ИШ : 
A randomized test of size a = .10 is therefore given by 
1 Li Ym, = бог5, 
АЙ if 3x = lord, ' 


0 otherwise. 
The power of this test is p 


5 . 
E(X) = / X, = O or S} + 114.P (Тогај; 
where р # + and can be computed for any value of p. 


We conclude this section with the following remarks. 


. Remark 1. So far we have been somewhat vague about the specification of . 


the null hypothesis. The convention used is to choose Но such that it rep- 
resents the status quo or some important. situation. If Ho is then rejected 
when in fact it is true, there is a significant change or loss. Thus the type I 
error is the one most important to the experimenter, and he would like to 
limit its probability at a certain fixed level 2, usually small. 

Remark 2. The problem of testing 

a Special case of the general decision problem described in Section 8. 8. 
Let  — {ao aj), where “ag represents the decision to accept Hy: 0 € Өү, 
and a, represents the decision to reject Hy. 


A decision function à is a 
mapping of 2, into „/. Let us introduce the ‘following loss functions; 


eg 
L,a)-[ it Ө. and Ц (6, aj) = 0 foi ай 6, 
and 
; 0 ifo РУ Ў 
14%, а) = [| косе and L-(6, а) = 0 for all 9. р 


* 


of hypotheses may be considered as 


| 


i d 
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Then the minimization of &1(0, б(Х)) subject to EjLi(0,. 0(X)) < ais 
the hypotheses testing problem discussed above. We have 


EqL (0, (X) = Р(х: д(х) = ао}, ac Oy 
2 = P,{Accept Ho|H; true), 


and 


- EgL(0, б(Х)) = Р{х: (x) = 21); 0€0»5 
= Pj(Reject Hj|0 € Өр true}. 


PROBLEMS 9.2 


1. A sample of size 1 is taken from a population distribution P(A). To test 
Hy: А = 1 against H,: A = 2, consider the nonrandomized test g(x) = 1 if x > 3, 
and = Oif x < 3. Find the probabilities of type I and type II errors and the 
power of the test against А = 2. If it is required to achieve a size equal to .05, 
how should one modify the test Ф? «_ М 


2. Let Xy, Xs: X, bé a sample from a population with finite mean "p and finite 
variance o?, Suppose that и is not known, but c is known, and it is required to 
test и = д against и = ш(щ > ш). Let n be sufficiently. large, so, that the central 
limit theorem holds, and consider the test 

ф(х Xoo xX)ymd dfx- k; 

=0 ifs k 

where £n! Xii x, Find k such that the test has (approximately) size а. What 
is the power of this test at и = д? If the probabilities of type I and type ЇЇ errors 
are fixed at o and 6, respéctively, find'the smallest sample size needed. 


3. In Problem 2, if is not known, find k such that the test о has size а. 


4. Let X, X» ++, X, be a sample from N (4,1). For testing и < до against и> · 
consider the test function 


1 ifz> fo Ly 
gi Xo эз Xn) = A 2 
0 ifx< а Vien. 


Show that the power function of o is a nondecreasing function of д. What is the 
size of the test? : 


5. A sample of size 1 is taken from an exponential pdf with parameter 0,аї is, 
X ~ G(1, 0) To test Hy: 0 = 1 against H,: 0 > 1, the test to be used is the 
nonrandomized test ў 

if x = 2, 


ox) = 1 
0 ifíxs2 


42 . ' NEYMAN-PEARSON THEORY 


Find the size of the test. What is the power function? 
6. Let Xj, Xz- X, be a sample from #(0, 0°). To test Hy: с = оу against 
Hy: a # оу, it is suggested that the test 
2 n vie xr» 0r OX c 

PCy хь c Xa) -b fies Dt sic; 
be used. How will you find c, and c; such that the size of 9 is a preassigned number 
a, 0 < a < 1? What is the power function of this test? E 
7. An urn contains 10 marbles, of which M are white and 10 — M are black. To 
test that M — 5 against the alternative hypothesis that M — 6, one draws 3 marbles 
from the urn without replacement. The null hypothesis is rejected if the sample 
contains 2 or 3 white marbles; otherwise it is accepted. Find the size of the test 
and its power. 


9.3 THE NEYMAN-PFARSON LEMMA 


In this section we prove the fundamental lemma due to Neyman and Pearson 
[85], which gives a general method for finding a best (most powerful) test of 
a simple hypothesis against a simple alternative. Let {fn 0€0!, where Ө 
= (00, 01), be a family of possible distributions of X. Also, f; represents 
the pdf of X if X is a continuous type rv, and the pmf of X if X is of the 
discrete type. Let us write f(x) = Л, (x) and fx) = f, (x) for convenience. 


Theorem 1 (The Neyman-Pearson Fundamental Lemma). 


(a) Any test o of the form 


1 if f(x) > k f(x), 
(1) 9G) = TG) if fix) = k A(x), 
0 if f(x) < k fo(x), A 
for some k > 0 and0 < T(x) < 1, is most powerful of its size for 
testing Hy: 0 = бу against Hy: 0 = 0. If k = оо, the test 


(2) | 9x)-1 if f(x) =0, 
=0 if у(х) > 0, 
‘is most powerful of size 0 for testing Hy against Hy. 


(b) Given a, 0< @ < 1, there exists a test of form (1) or (2) with 
T(x) = 7(а constant), for which E, ф(Х) = a. ` 


Proof. Let o be a test satisfying (1), and ф* be any test with E, o*(X) < 
En Y(X). In the continuous case — Miss 
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[оо = 9*6 (GH = Ел) 
=‹ [| + Foe - ere» (ло) — кло). 


fi2Mo < ; 
For any xe(fi(x) > k fo(x)}, ф(х) — ф(х) = 1 — g*(x) = 0, so that the 
integrand is > 0. For x e {fi(x) < kK f(x}, ф(х) — ф(х) = — ф(х) < 0, 
so that the integrand is again > 0. It follows that 


ео) ет) (ло) — кл) к 
= E,,9(X) — En p* (X) = (ЕХ) — Е„р*(Х))> 0, 
which implies 
Ey, Ф(Х) — Е„ф*(Х) 2 К(Е„р(Х) — Е„р*(Х)) = 0 
since Ep p*(X) < Ey,p(X). 
If k = oo, any test ф® of size 0 must vanish on the set (f(x) > 0}. We 
have 
En oX) — En p* (X) = (1 — ф(х) fix) dx > 0. 
(х) 9 
The proof for the discrete case requires the usual change of integral by a 
sum throughout. 
To prove (b) we need to restrict ourselves to the case where 0 < a < 1, 


since the MP size 0 test is given by (2). Let 7(x) = 7, and let us compute 
the size of a test of form (1). We have 


En pX) = Pa AX) > КЛ(Х)) + T Po {fi = k AX) 
= 1— Paf AX) < KIX) +T Pr f{AiCX) = KA). 


Since Р, {Л(Х) = 0} = 0, we may rewrite Ep, p(X) as 


@) Ер) = 1 ~ PAD < К} + res [109 =). 
Given 0 < а < 1, we wish to find К and 7 such that Е, p(X) = a, that is, 
AO) PHO 2 Vadis 
(4) Р] s i} - т.209 = i-a 
Note that 
AX) 
д9 0 


is a df so that it is a nondecreasing and righ continuous function of k. If 
there exists а kg such that 
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AX) A ne 
Р] СЫ ko} po 
we choose Y = 0 and k = ky. Otherwise there exists a ky such that 
x 

(5) (CU <b} а РАЮ s ta}, 
that is, there is a jump at ko (see Fig. 1). In this case we choose К = ko 
and 
(6) т = Poot HOO JX) < ko} = (1-а) 

Po, AXN f(X) = ko). 


Since 7 given by (6) satisfies (4), and 0 < 7 < 1, the proof is complete. 


Fig. 1 


Remark 1. . t is possible to show (see Problem 6) that the test given by (1) 
or (2) is unique (except on a null set), that is, if ọ is an MP test of size о of 
Ho against Hj, it hust have form (1) or (2), except perhaps for a set A with 
P(A) = P4(A) = 0. 


Remark 2. For generalizations of Theorem 1 we refer to Lehmann [70], 
page 83, ог Ferguson [30], page 204. \ 


Theorem 2. If a sufficient statistic T exists for the family (f: 0 <0}, 
Ө = (05, 0), the Neyman-Pearson MP test is a function of Т. 


^». Proof. The proof of this result is left as an exercise. 


Remark 3. 1 the family (/o; 0 e Ө} admits a sufficient statistic, one can 
restrict attention to tests based on the sufficient statistic, that is, to tests that 
are functions of the sufficient statistic, If 9 is a test function and T is a suffi: 
cient statistic, E (p(X)| T) is itself a test function, 0 < E(g(X)|T) < 1, and 
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ELE {y(X)|T}} = EHX), 


so that o and E (|7) have the same power function. 


Example 1. Let X ~ (0, 1) under Ho gus X ~ €(1, 0) sinter Н\. To find 
an MP size a test of Ho against Hj, 


Ae) _ LU +x a) 
AO) AVI e 


Ё "E tse nt 
zi xo 
Thus the MP test is of the form 
7 ех, 
vit, /2£ Tk 
g(x) = л 1+х 


0 otherwise, 


where k is determined so that Ey Ф(Х) = a. Note that the boundary set 
‘has probability 0 under both Hp and Hy. Since fi(x)/ fo(x) is a nondecreasing 
function of |x|, it follows that 


по р>, 
x) 0 if |x| < К 


where k; is determined from 


о | 
— 5-22 dy = | — 
f3 Vn’ dx=1—-a. 


It follows that Кү = z,,2. The power of the test is given by 


1 2 
Ер(Х) = 1 – f dx = 1 — tan tk 
19(X) кте nk 


2 3 
== теп l Zar 


Example 2. Let Xi, Xo, «+, X, be iid b(l, p) rv's, and let Ho : p. = Po, 
Hi:p = py py > po. The MP size a test of ‘Hy against Hj is of the form 


1, Ax) = т (Керу =ч a Sk, 


pipi - Ex; 
ф(х ух) = Ti Ax) = ( ро)" 
n Ax) < : 


where К and ? are determined from 


ЕХ) = a. 


€ 
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` Now 
-(a\™(1-p ne 
à) (2) (i 35) 
and since p, > po, A(x) is an increasing function of Y) x;. It follows that 
A(x) > k if and only if X x; > Кү, where Кү is some constant. Thus the MP 
size a test is of the form 


1 if x, >k, 
en-[r if Dx; =k, 
0 . otherwise. 
Also, kı and 7 are determined from 


а = Ego m PE X> ki} err (E x, = к) 
p P P m po)" + r( ga = po)", 


Note that the MP size a test is independent of p; as long as Pi > po that 
is, it remains an MP size a test against any p > py and is therefore a UMP 
test of p = po against р > po. 

In particular, let n = 5, ро = 4, pj = {лапа а = .05. Then the MP test 
is given by 


1 Lx >К, 
“=fr Ях =k, 
0 Ух; <k, 


where k and 7 are determined from 


mam E(2)a* + r(1)o». 
It follows that k = 4 and 7 = .122. Thus the MP size a = .05 test is to reject 
р = } іп favor of p = 1 if У" X, = 5 and reject p = } with probability 
122, if У" X, = 4. 
It is simply a matter of reversing inequalities to see that the MP size a 
test of Ho: p = po against Hy: р = p,(p; < po) is given by 
1 if Dx;<k, 
“=fr if Zx =k, 
9-. a NX >К, 
where 7 and К are determined from EroX) = а. 
We remark that in both cases (ру > po, p, < po) the MP test is quite 
іе, We would tend to accept the larger probability if a larger number 
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of “successes” showed ир, and the smaller probability if a smaller number 
of "successes" were observed. - 
Example3. Let Xj, Xz, «++, X, be iid (д, о?) rv's where both u and g? are 
unknown. We wish to test the null hypothesis Ho: и = до 0° = о? against 
the alternative Ну: и =, o^ = ш. ‘The fundamental lemma leads to the 
following MP test: 

1 if Ax)» К, 
amdo d Ах) < k, 


д») = Uoo 2e Y exp {= DE = адо) 
(oos 2x)" exp (— [Gi — 18) 1204) , 


and k is determined from Eng Ф(Х) = a. We have 


where 


In 


, 2 
хә = eps (о) (CR -- 
If ду > до, then 
Ax)>k  ifand only if L x;»k, 


where k' is determined from 


а= АР X, kj - p [X tn. T k xs) 


giving k' = 2, ndo + про. The case ш < ду is treated similarly. If оо. is 
known, the test determined above is independent of y, as long as By > Шо 
and it follows that the test is UMP against H/: и > д, o? = д. If, how- 
ever, go is not known, that is, the null hypothesis is a composite hypothesis 
HS: u = po, 0? > 0 to be tested against the alternatives Hii p= m> 0 
(ш > до), then the MP test determined above depends on 02. In other words, 
ап MP test against the alternative ду, of will not be MP against My д, 
where 02 # с}, 


PROBLEMS 9,3 
1. A sample of size 1 is taken from pdf 


2 : 
An = [et =» ifü«x«6, 
0 otherwise, 
Find an MP test of Hy: 0 = бу against H,: # = 60, < 0). 


-418 NEYMAN-PEARSON THEORY 


2. Find the Neyman-Pearson size a test of Hy: 0 = Oy against H,: 0 = 0,(0, < бу), 
based on a sample of size 1 from the pdf 


Sex) = 20x +2(1 — 0X1 — x) 0<х<1, бє (0, I]. 
3. Find the Neyman-Pearson size a test of Hy: 8 = 1 against Н: В = Вү( > 1), 
based оп a sample of size 1 from 


stan, SOx? 0<x<l, 
fea B = e otherwise. 
4. Find an MP size а test of Hy: X ~ /(х), where fix) = (2x)-1/2 e", 
— бо < x< oo, against Hy: X ~ f,(x) where Л(х) = 27! e7", — oo < x < co, 
based on a sample of size 1. 


5. For the pdf f,(x) = e-*-?, x > 0, find an MP size a test of 0 = 0, against 
6 = 0,( > 0,), based on a sample of size л. 


6. If o* is an MP size a test of Ho: X ~ fo(x) against H,: X ~ fi(x), show that it 
has to be either of form (1) or form (2) (except for a set of x that has probability 
0 under H, and Hj). 


7. Let o* be an MP size «(0 < æ < 1) test of Hp against Hj, and let k(a) denote 
the value of К in (1). Show that if a; < a», then k(a;) S k(a). 


8. For the family of Neyman-Pearson tests show that the larger the a, the smaller 
the B ( = P (Type II error] ). 


9. Let] — В be the power of an MP size a test, where 0 < а < 1. Show that 
a «1 — В unless Py, = Ps. 


10. Let а be a real number, 0 < a < 1, and y* bean MP size a test of H against 


“Hy, Also, let = Ej, 9*(X) < 1. Show that 1 — o* is an MP test for testing H, 
‘against Н, at level 1 — В. 


11. Let Xy, X» ©, X, be a random sample from pdf 
fs 4 if O0<0<x < о. 


Find an MP test of 0 = 0, against 0 = б, (# б). 


12. Let X be an observation in (0, 1). Find an MP size a test of Hy: X ~ f(x)=4x 
if0 <x <4, and = 4 — 4x if 1 < x < 1, against Н: X'~ f(x) = lifü c x «1. 
Find the power of your test. 


9.4 FAMILIES WITH MONOTONE LIKELIHOOD RATIO 


_ In this section we consider the problem of testing one-sided hypotheses on a 


single real-valued parameter. Let {fọ 0€ Ө} be a family of pdf's (рт), 
Ө © 2, and suppose that we wish to test Hy: 0 < бо against the alternatives 
Hy: @ > 00 or its dual, Ho: 6 = бу against Hj: 0 < 00. In general, it is not 
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possible to find a UMP test for this problem. The MP test of Ho: 0 « 05, d 
against the alternative 0 = 0(> 0) depends on 0; and cannot be UMP. 

we consider a special class of distributions that is large enough to iclüde 
the one-parameter expone -tial family, for which a UMP test of a one-sided 
hypothesis exists. паа 


Definition 1. Let {/%,:0є0) be a family of pdf's (pmf's), Ө © 4. We say 
that { fj) has a monotone likelihood ratio (MLR) in the statistic T(x) if for 
0; < 62, whenever fo,» fo, are distinct, the ratio Sol fo, (x) is a; nondecreas- 
ing function of TG) for. the set of values x for which at least one rn fa, 


and fj, is > 0. Ы 
? 


It is also possible to define families of densities with nonincreasing MLR in 
T(x), but such families can be treated by symmetry. 


Example 1. Let Xj, X; =, X, ~ U[0, 0], ô > 0. The joint pdf of X, ---, X, 
is epum reu 
1 1 
s» =| 0 x max x; 5.0, 
0, ' otherwise." | 
Let 0; > 62 and consider the ratio EA RN 
fa 1/0) Temas sito 
х) (1/ 05) Tomax #7502) 
Š (By Tenia sisti] Каха + 
Let R(X) = Tenax че! iesist aye 


"d ( 1, max x, e [0, 62), 
co, "max x; €[63, 01]. Ў 
Define R(x) = oo if max x; > бү. It follows that P f, isa Жүл оос 


function of max ,<;<, Xn and the family of uniform densities on [0, 0] се 
an MLR in max у; X; 


Theorem 1, The one-parametct exponential family а у у 
n fix) = exp (Q(0) T(x) + S(x) + D(0))s ^ d 


„Where Q(0) is nondecreasing, has an MR in Де: \ d 
(X) 4,3 

Proof. The proof is left as an exercise. indi ш\л 

Remark 1. The nondecreasingness of QU) can be Stained byi a repara- 

metrization, putting 9 = Q(6), if necessary. 
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Theorem 1 includes normal, binomial, Poisson, gamma (one parameter 
fixed), beta (one parameter fixed), and so on. In Example 1 we have already 
seen that U[0, 6], which is not an exponential family, has an MLR. 


Example 2. Let X ~ (1, 0). Then 


Je) E (x—6Y 

ЧЕ: «Аил УЧИ +) xd, 
LA JD н e Oy y tae oil 
and we se that eA, M does not have an MLR. 


Theorem "e lat X^ "is 0 € 0, where (fj) has an MLR in T(x). For testing 
Hy: 8 < 0, against Hy: 0 > 0o, бє Ө, any test of the form 


» T° if T(x) > to 
(2) ф(х) =47 if T(x) = to 
0 if TX) <b, 


has a nondecreasing power function and is UMP of its size E, p(X) = a 
(provided that the size is not 0). 

Moreover, for every 0< a < 1 апі еуегу-бує Ө, there exists а to 
— © < 10 < o, and 0 < 7 « 1 such that the test described in (2) is the 
UMP size a test of Ho against Hj. 


Proof. Let 0;, 02€ Ө, 0; < 02. By the fundamental lemma any test of the 
form 


7 1, A(x) > К, 
(3) «x = |. Ax) = k, 
0, Ax) < k, 


where A(x) = /„(х)/ fn(x), is MP of its size for testing 0 = 0; against 0 = 02, 
provided that 0 < k « oo; and if k — co, the test 


seco vb Чаны l. df fo) = 0, 
e a) - (5 if f.) > 0, 


is MP of size 0. Since fọ has an MLR in Т, it follows that any test of form 
(2) is also of form (3), provided that E; (X) > 0, that is, provided that its 
size is > 0. The trivial test ф'(х) = a has size a and power a, so that the 
power of any test (2) is at least а, that is, 


Е, Q(X) z. Ej, (X) = а = E, 9(X). 


It follows that, if 0; < 02 and Ej p(X) > 0, then Е, Ф(Х) < Ey, (X), as 
asserted, i 
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Let 0, = бу and 0; > ĝo, as above. We know that (2) is ап MP test of 
its size Ey, Ф(Х) for testing 0 = 0o against 0 = 00, > бу), provided that 
E,,(X) > 0. Since the power function of p is nondecreasing, 


(5) ЕХ) < Е„ф(Х) = a9 (ога б< бу. ү bow 
Since, however, р does not depend on 6; (it depends only on constants k 


and 7), it follows that o is the UMP size ap test for testing 0 = Oo against 
0 > бу. Thus ф is UMP among the class of tests o^ for which 


(6) En Ф'(Х) < Ey, Ф(Х) = ao. 


Now the class of tests satisfying (5) is contained in the class of, tests 
satisfying (6) [there are more restrictions in (5)]. It follows that o, which 
is UMP in the larger class satisfying (6), must also be UMP in the smaller 
class satisfying (5). Thus, provided that ay > 0, ф is the UMP size ap test 
for 0 < 0, against 0 > бу. 

We ask the reader to complete the proof of the final part of the theorem, 
using the fundamental lemma. ' arf 


Remark 2. Ву interchanging inequalities throughout in Theorem 2, we see 
that this theorem also provides a solution of the dual problem Hj: 029% 
against H;:0 < 00. j 


Example 3, Let X have the pmf 


A dd M 


"reped itr (x20,1,2,, M. 


(5) 


Pya(X = x М+1 N—M-n*x 


Pu{X=x} “М-М M+1-x ' 
We see that {Py} has an MLR in x (Py, / Py, where Mz > M; is just a 
product of such ratios). It follows that there exists a, UMP test. of 
Но: М < Mo against H, : M > Mo, which rejects Hy when X is too large, 
that is, the UMP size a test is given by 


Since 


1, x>k, 
wfr x-ho- 
10, x «k, 
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where (integer) К and. 7 are determined from 


dit ba; "oC ES (X) = a. 


For the one-parameter exponential family UMP tests exist also for some 
two-sided hypotheses of the form 


o © Hy :0 < 6; or 0 > 0 (0, < б). 


We state the following result without proof. 


Theorem 3. For the one-parameter exponential family (1), there exists a 
UMP test of the hypothesis Ho: 0 < 0; or 0 > 0 (0, < 2) against Hy: бу < 
9< 6, that is of the form 

yet l' if <TR) < cy, 

(8) : G(x) = 17; if T(x) =¢,, 121,2 (cy < с), 


0 if T(x) <с or >с, 


where the c's and the 7's are given by 


(9) nin? { Ey, Q(X) = Е„„ф(Х) = о. 


3i 


See Lehmann [70], page 88, for proof. 


Example 4. Let Xj, Xo, ---, X, be iid M(t, 1) гу. To test Ho: и € ду or 
й = щ (ду > до) against Hy: po < д < ш, the UMP test is given by 
( ` ү 
i 1 ife «2x «c, 
Фб) = т, if Dx; E €. OF Co, 
0 if Dx; <c or >c, 


where we determine cj, c; from 


a -Pale < X X; < с) = Pafo < DX; < с) 


and, = 7; = 0. Thus 


A cı — "uo 2i X; — аш Co — пц 
Е P{ vn ge МУУ и } 
= р[ а-па < УХ пш _ с – nm 

P{ с vn is Vn } 
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a 5 <2 < ама) 
P{ vn vn 


2 р[ а-та 7 а-та) 
= P{ ма Б 5 Мп} 


where 2 is W(0, 1). Given а, n, Ho, and ду, we can solve for c, and с; 
from the simultaneous equations 


алан) обала) 


where $ is the df of Z 


Remark 3. We caution the reader that UMP tests for testing Hy: бу < 0 < 0; 
and H6: 0 = 0, for the one-parameter exponential family do not exist. An 
example will suffice. : 


Example 5. Let Xj, Ж, ---, X, be a sample from J/(0, 0”). Since the family 
of joint pdf's of X = (X; ---, X,) has an MLR in 227 X5, it follows that 
UMP tests exist for one-sided hypotheses ø > ao and с < ap. 

Consider now the null hypothesis Ho: ¢ = gg against the alternative 
Н\: о # со. We will show that a UMP test of Ho does not. exist. For 
testing с = ap against с > 09, а test of the form 


em-(h Ede 


0, otherwise, 
is UMP, and for testing g = шо against с < ао, a test of the form 
В(02) 


Tm ii aia a ee 
= 
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д 21 x. < € 
9x) = ja otherwise, 
is UMP. If the size ischosen as a, then сү = 0777, and с; = 02 2а 
Clearly, neither o; пог pz is UMP for Hy against Н, : о # со. The power of 
any test of Hp for values оѓо > ap cannot exceed that of фу, and for values of 
a < og it cannot exceed the power of test pz. Hence no test of H can be 
UMP (see Fig. 1). 


PROBLEMS 9.4 


1. For the following families of pmf's (pdf's) f,(x), 0€6 © 2, find a UMP size 
a test of H,: 0 < 0, against Н, : 0 > б„ based on a sample of n observations. 

(а) f(x) = Fa- oy x = 0,150 <0 <1. 

(b) f) = (1/4/22) exp ( — (x — 09/2), — со < x < oo, — со< б< оо. 

(с) fx) = e-t(0s|x), x = 0, 1, 2, +; 0 > 0. 

(d) file) = (1/фе-*/*, x > 0, 0 > 0. 

(e) A(x) = [1/Г(@)]х'-1е-*, x > 0, 0 > 0. 

‚ €) Л) = (1/0)x?-1, 0< x «1,0» 0. 

2, Let Xp X» --, X, be a sample of size n from the pmf 


Ped, X212 NEE TL, АЧ: 


(a) Show that the test 
1 if max (xy, х, +++; Xn) > Му 
Xp Xa oc XQ) = Р 
об х i 3 if max (xy, x, +, х„) < No 
is UMP size a for testing Hy: N < N, against H, : N > No. 
(b) Show that 
1 if max (xj, Xo, +++) х„) > No or 


gy X» n X) = | тах (Xy, хь +, Xp) < аї/* No, 
0 otherwise, 


is a UMP size a test of H; : N = N, aganst Hi: N # No. 
3. Let X,, X, --., X, be a sample of size n from U(0, б), 0 > 0. Show that the test 
1 if max (xy, +++; х„) > бу ^ 
æ ~ їЁтах (хү, x, =- x,) € бу. 
is UMP size a for testing Hy: 0 < 0, against H, :0 > 0, and that the test 
l.. if max (ху, =, x,) > % or 


ФҚ хь rs xy) = | тах (Xy, Xp ---, х„) < б, at/", 
O - otherwise, 


is UMP size a for H; : 0 = 0, against Hi: 0 + 0,. 


Pi» хь х) = { 


UNBIASED AND INVARIANT TESTS 425 


4. Does the Laplace family of pdf's 
f(x) = exp (- [x — A}, — co « x « co, 0є2, 
possess ап MLR? 


9.5 UNBIASED AND INVARIANT TESTS 


We have seen that, if we restrict ourselves to the class ©, of all size a tests, 
there do not exist UMP tests for many important hypotheses. This suggests 
that we reduce the class of tests under consideration by imposing certain re- 
strictions. 


Definition 1. A size а test ø of Ho: 0 € Op against the alternatives Hy: 0 € Ө, 
is said to be unbiased if 


(1) EjXX)za ога] 0є0;. 


It follows that а test o is unbiased if and only if its power function 8/0) 
satisfies 


(2) B,(0) S a... forð € 6s 
and 
(3) B,0)z a for PEO. 


This seems to be a reasonable requirement to place ona test. The probability 
of rejecting Ho when false is larger than the probability of rejecting Ho when 
true. 


Definition 2. Let U, be the class of all unbiased size a, tests of Hp. If there 
exists a test р € U, that has maximum power at each # € 8j, we call o a UMP 
unbiased size a test. 


Clearly О, c ®,. If a UMP test exists in Ф„, itis UMP in Ug. This follows 
by comparing the power of the UMP test with that of the trivial test 
g(x) = a. It is convenient to introduce another class of tests. 


Definition 3. “А test q is said to be a-similar on a subset Ө* of 6 if 

(4) BA0) = Е, (X) = а, . for 0c 6*. 

Actest is said to be similar on a set 6* c Ө if it is a-similar on Ө* for some 
а, 0<а 51. 


It is clear that there exists at least one similar test on every Ө*, namely, 
Ox) = а, 0 <а51. 
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Theorem 1. Let 6,00) be continuous in 0 for any g. If o is an unbiased size 
а test of Ho: бє Ө, against Ну: 0€6,, it is o-similar on the boundary 
4-6,n6;. (Here A is the closure of set A.) 


Proof. Let 0€. Then there exists a sequence {0,), 0, € Oy, such that 
0, + 0. Since 8,(8) is continuous, 8,0,)  B,(8); and since 6,0,) < о, for 
0,є0,, 8,0) < a. Similarly there exists а Sequence (07), 0; e Ө, such that 
8,03) > а (р is unbiased) and 0; 0. Thus VACA —. 6,00), and it follows 
that 8,(0) > а. Hence 8,00) = о for бє A, and g is.a-similar on 4. 


. Remark 1. Thus, if B,(0) is continuous in 0 for апу ø, an unbiased size q 
test of Ho against Н, is also a-similar for the pdf's (pmf's) of A, that is, for 
{7 бє A}. If we can find an MP similar test of Н: 0 є A against Hi, and if 
this test is unbiased size a, then necessarily it is MP in the smaller class, 


Definition 4. А test 9 that is UMP among all a-similar tests on the boundary 
A = бп Ө, is said to be a UMP a-similar test, 

It is frequently easier to find a UMP a-similar test. Moreover, tests that 
are UMP similar on the boundary are often UMP unbiased. 
Theorem 2, Let the Power function of every test o ot Hy: 0 € Ө against 


Ну: 0 € Ө, be continuous in 0. Then a UMP a-similar test is UMP unbiased, 
provided that its size is (у for testing Н, against Н}. 


| Proof. Let oy be UMP a-similar. Then E, X) < a for 0€ 6,. Comparing 


Ф(х) =a, we see that Фо is 
hat the class of all unbiased 

imilar tests. It follows that фу 
is a UMP unbiased Size a test. 


Remark 2, The 


i continuity of power function 8,00). is not always easy to 
check. For some 


sufficient conditions see Р.2.13. 


ё 
Example 1. Let Xy X, +, Y, bea sample from M(t 1). We wish to test 


Hh: А < 0 against H,: А> 0. Since the family of densities has an MLR in 
1 Xn we can use Theorem 9.4.2 to conclude that a UMP test rejects Hy if 
> 
1 


Li X, c. This test is also UMP unbiased, Nevertheless we use this exam- 
ple to illustrate the concepts introduced aboy. т 


Неге @ = (4:59), Ө, = {ша > 0}, A= 8n, = 


= 0}. Also 
the power function of any test o is g | 
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peel 
ado = fof oben) Cre {= SMS \ dx, 


which is continous in дь It follows that any unbiased test of size a of Hy 
has the property 8,(0) = а of similarity over A= {u = 0}. To use Theorem 
2, we find a UMP test of М: ue A against Hy. Let ду > 0. Then the funda- 
mental lemma gives an MP test of и = 0 against и = д as 


л ҮР 
n; = un) } к, e 


23): | if exp {7 - 3 


0 otherwise, 


го 3x uen 
9.) : 
cane 


and k is determined trom 


that is, 


a = PX X; > k} - (5x > A 


Thus К = ул ze Since ф is independent of д as long as ш > 0, we see 
that the test 


[is EX nz. 
gum otherwise, * 
is UMP a-similar. We need only to check that. is.of the right size for 
testing Ho against Hy. We have, for p< 0,- 
ЕХ). = РАУ Xi > п Za} 

= pi 21Хг=— тш RI. 

- ( Py Bl. Мп D 

sS P(Z-z), { 
since —4/n д > 0. Неге Z is (0, 1). It follows that 

Е, ДХ) sa forw<0, 


hence p is UMP unbiased, і 

Theorem 2 сап be used only if it is possible to find a UMP a-similar test. 
Unfortunately this requires heavy use of conditional expection, and we will 
not pursue the subject here. We refer to Lehmann [70], Chapters 4 and 5, and 
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Ferguson [30], pages 224-233, for further details. Here we content ourselvés 


р /, 


with the following important results, which we state without proof. У, 


Theorem 3. Let Xi, Xp, ++; X; be а sample’ from (д, a°), where’ both и 
and g? are unknown. 
(a) For testing Ну: и S po, o° > 0 against Hy: и > fp, 0% 


X — Щщ n 
(5) T Ф(х) = t Уи = 1] zx, – 3) Ya 


otherwise, 


0, the test 


is UMP unbiased, The UMP unbiased test of Hoz: и > po 02> 0 
against Hi: и < д, 0” > 0 is obtained by réversing the inequalities 
in (5). 


(b) For testing Hos: су, — о < Hu «o against Hy:0 > 209 
—00 < д < co, the test / 
Lf DO- x) = ey, 
(6 969 0 otherwise, 


is UMP unbiased. The UMP unbiased test of Hyi0 > 0-0 «p 
< оо, against Ну: 0 < оу, –%6 < A < оо, is obtained by reversing 
the inequalities in (6). 


For proof we refer to Lehmann [70], pages 94-97, 163-168. 


Remark 3. The test defined in (5) and its dual are called one-tailed t-tests, 
since the test statistic has a (mn — 1) distribution. The test defined in (6) and 
its dual are called one-tailed y"-tests since the test statistic has a g distribu- 
tion. The constants c; and с; are determined from 


puer aya а pd! HAE, - fro) са 


Theorem 4. Let Xi, X», ---, X, be a sample from (ш, 0?), where both p 
and g? are unknown. . 


(a) For testing Hy: и = д о? > 0 against Hj: А # Ho o° > 0, the test 


f lc УЖЕ) 
Ф 99) = [ | Via — 0] Et, =a rae 
x otherwise, 


` is UMP unbiased, 
a 


(b) For testing Hy: g = 7» —90 < р < co against Hy:0#09, —co < p< 
оо, the test 
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— # 2 \ 
(8) go) 4 УНИ vota v 
| 0 
3 otherwise, 
is UMP unbiased. 
For proof see Lehmann [70], pages 164-166. 


Remark 4. The test defined in (7) is known as a two-tailed t-test, and that 
defined in (8), as a two-tailed y -test The constant c is determined from 


Pres ы. > c) =a, 


so that с = t,-1,¢/2- The constants су and c; are determined from 


2 
E Ж? 


where y^ , (y) is the pdf of a 3? rv with n — 1 d.f. Since 
Xa О) = (0 - 0x14 O) 
the second of these conditions can be Written as 
[Жа a 1e. 


In practice, one uses the equal tails test, given by 


Ó Xii (dy = fai. (y)dy = T 


so that сү = X? 11 ay» С = Х,ар: The equal tails test isa good approni- 
mation to the unbiased test, at least for moderately large n. 


Theorem 5. Let X; ---, Xm and Yj, =, Y, be левые samples from 
normal distributions #(y, 2^) and (0, e ?), respectively. For testing hypoth- 
eses и < 0, и > 0, and u = 0, the tests with rejection regions (Хх, y) > €» 
Kx, y) < c; and |t(x, y)| > cs, respectively, where 


(9) x, уу) = 


are UMP unbiased. 


Proof. For proof see Lehmann [70], „pages 168-173, or ` Ferguson [30], page 
234. 
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Remark 5. Ву Corollary 5 to Theorem 7.5.1 we see that (X, Y) ~ 
(т + n — 2) when & = б. The constants с, с„ and сз are determined, 
using the central t-distribution for m + n — 2 d.f., and are given by 


Im -n-2,25 Im4n-2,-2» aNd 1,1, 2,475, respectively. The tests defined in Theorem 
5 are called two-sample t-tests. 


Theorem 6, Let Xj, X, +, Xm and ү, Yo, «++, Y, be independent samples 
from normal distributions J4/(u; 0°) and (0, t°); respectively. For testing 
efe x 1, c fe 1, and a/c? — е tests, with rejection! regions 
F(x, y) > с, F(x, y) S с, and F(x, y) > c, or F(x, y) < с, (c, < с), respectively, 
where m 

p Gy — ху (т — 1) 

я 


(10) | F(x, y).= , 
Eo- dy Mn — 1) 


are UMP unbiased, 
For proof see Lehmann [70], pages 169-170. 


Remark 6. Ву Corollary 4 to Theorem 7.5.1, F(X, Y), defined in (10), has 
an Кт — 1, n — 1) distribution when z^ = 0°. The constants c, and с; are 


given by cy = F, 4, and c, = F,-1,5-11-4. The constants c; and c, are 
determined from ; 


4 «4 
f. Basm- a- 0)dy = ja Ba/2ym«n.a/2 o- (Y) dy = 1 — a, 


where By, 


À a(Y) is the pdf of a B(a, 8) rv. In practice, one uses the equal 
tails test: a? gi 


Ы mau) d 
Jefe 009 = EOD = ©, 
lo cg 2 


Where Fy (y) is the Pdf ofan F(a; p) rv. The tests described in Theorem 6 
are called F-tests. 


Yet another reduction is obtained if we apply the principle of invariance 
to hypothesis testing problems. We recall that a-class of distributions is in- 
variant under a group of transformations ¢ if for every ge Y andevery 0 c8 


there exists a unique (' € 6 such that g(X) has distribution. P, whenever 
X ~ Py. We write 9’ = 26. à 


Definition 5. А group & of transformations on the space of values of X | 
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leaves a hypothesis testing problem invariant if € leaves both (Pj: 0 € 6g) 
and {P,: 0 € Ө} invariant. 


Definition 6. We say that 9 is invariant under ¢ if 
e(g(x)) = ф(х) for all x and all g e 9. 


Definition 7. Let Y bea group of transformations on the space of values of 
the rv X. We say that a statistic T(x) is maximal invariant under 4 if (a) T 
isinvariant; (b) T is maximal, that is, T(xi) = T(x2) => x1 = g(X2) for some 
ges. 


Example 2. Let x = (xy, xo, --:,X,), and Y be the group of translations 
BAX) = (х + 6, ++, X, + с), —o«€c«o. 
Here the space of values of X is 2. Consider the statistic 
T(x) = (хо 93s 1X8, — X83) 
Clearly, 
T(g.(x)) = (Xn — Xy 5 Xn — 423) = T(x). 
If T(x) = T(x’), then x, — x; = x, — x, i= 1,2, +--+, n — 1, and we have 
х х= ху х= с (i= 1, 2, +++, n — 1), that is, g(x') = (xi 6 +, 
X, + с) = x and T is maximal invariant. 
Next consider the group of scale changes 
BAX) = (xy s CXpy с> 0. 


Then 


DN if all x, = 0, i 
d ИК, 
(x) (е, Su a if at least one x; 4 0, z- (х a) Y 


is maximal invariant; for 


Т(8.(х)) = T(cxy, - сх„) = T), 


and if T(x) = T(x’), then either T(x) = T(x’) = 0, in which case x; = x; = 0, 
or T(x) = T(x’) 5 0, in which case х;/2 = х;/2', implying x; =(z’/z)x, = CX; 
and T is maximal. 


Finally, if we consider the group of translation and scale changes, 
8(х) = (аху + b, ---, ax, +b), а> 0, =% <b< a, 
a maximal invariant is ч 
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0 if B = 0, 
nahas ara, 2290 3 


where x = n! Jr x, and 8 — n! Di (x, — xy. 


Definition 8. Let 7, denote the class of all invariant size о tests of Hy:0€0, 
against Ну: 0 € O,. If there exists a UMP member in 1, we call the test a 
UMP invariant test of Hy against H}. 


The search for UMP invariant tests is greatly facilitated by the use of the 
following result. 


Theorem 7. Let T(x) be a maximal invariant with respect to 9. Then o is 
invariant under 4 if and only if pis a function of T. 


Proof. Let ф be invariant. We have to show that Tox) = Т(х > 
9(x;) = ф(х). If T(x) = T(x?), there is a ge 4 such that X; = g(x2), so that 
$6) = e(2(x2) = (x2). 

Conversely, if ф is a function of T, (x) = h[T(x)], then 


(8) = h[T(s())] = AET] = g(x), 


and o is invariant. 


Remark 7. The use of Theorem 7 is obvious. If a hypothesis testing prob- 
lem is invariant under a group $, it suffices to restrict attention to test 
functions that are functions of maximal invariant Т. 


Example 3. Let X}, Y, --, X, be a sample from N (u, a°), where both 4 
and o° are unknown. We wish to test Ho: с> су, — о < р < ©, against 
Hi: 0 < do — o0 < ji < co. The family {Ж (д, 0”)} remains invariant under 
translations x; = x; + c, —со < c < co. Moreover, since var (X + c) = var 
(X), the hypothesis testing problem remains invariant under the group of 
translations, that is, both {4(u, 0”): o? > ш} and {W(u, 0°): о? < в} 
remain invariant. The joint sufficient statistic is (X, У(Х, — X)), which is 
transformed to (X + c, У(Х, — X J’) under translations. A maximal invariant 
is У(Х, — X). It follows that the class of invariant tests consists of tests that 
are functions of УХ; — Xy. 

Now XX; — XY'lo* ~ (п — 1), so that the pdf of Z = У(Х, — Xy is 
given by 1 ; 
DNE 79/2 ез2, 


PE G 
MB CEDE 


z>0. 
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The family of densities. { f,2: 02 > 0) has ап MLR іп z, and it follows that 
a UMP test is to reject Ну: g? > аё if z < К, that is, а UMP invariant test 
is given by 

SUIS; — ху <I, 
we) = {9 ir Eœ- >k, 


where К is determined from the size restriction 


2 


^ ї 2 
а= PEG - E sk] = ОЕ) 
0 0 


that is, 
k= % pe : 
This was the UMP unbiased test described in Theorem 3. 


Example 4. Let X have pdf f(x, — 6, ---, х, — 0) under Н; (i = 0, 1), 
—о < 0 < co. Let $ be the group of translations 


80) = (+ с, --.,х, + 0),  -e«cconz2. 


Clearly, g induces g оп Ө, where #0 = 0 + c. The hypothesis testing 
problem remains invariant under $. A maximal invariant under $ is 
TOO = (Xi — X, X, Х,) = (Ty, To - Т, 1). The Class of invariant 
tests coincides with the class of tests that are functions of T. The pdf of T un- ` 

` дег Н, is independent of 0 and is given by [Zo fiti + 2,1 + 22) dz. 
The problem is thus reduced to testing a simple hypothesis against a simple 
alternative. By the fundamental lemma the MP test 


oi if At) > c, 
Ф@„ to, =, t,-1) = b if A(t) «c, 


where t = (ti, íz ---, t, ;) and 


ae [лаа када 
É Solty +2, =, ta + z, z)dz 


is UMP invariant. 


A particular case of Example 4 will be, for instance, to test 
C Ho: X~ (0, 1) against Hy: Y ~ € (1, 0, 0€ 2. i 


We conclude this section by remarking, without proof, that the tests de- 
fined in Theorem 3 are all UMP invariant. Test (7) is also UMP invariant, . 
but invariance considerations do not reduce the problem of testing с = gg 
to lead to a UMP invariant test. The two-sample t-tests in Theorem 5 are 
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also UMP invariant, as is the F-test of @?/т? > 1 (or 02/72 < 1) considered 
in Theorem 6. We refer the reader to Lehmann [70], pages 228-229. 


PROBLEMS 9.5 


1. Totest HX ~ A (0,1), against H, : X ~ €(1, 0), a sample of size 2 is available 
on X. Find a UMP invariant test of Н, against H,. 


2, Let X = (X, Xa ---, XJ, n > 3, have pdf 


Y (Gu Е ъа ou д, 
BOY ERE 

under Н. (i = 0, 1), — oo < р < co, 0 > 0, where both и and 0 are unknown and 
fis ever.. Show- that the problem of testing Hy against H, remains invariant under 
the group of transformations x,’ = ax, + b (i = 1,2, s, n), a #0, b e 2. Finda 
UMP invariant test. í 


(Lehmann [70], 248-249) 
3. Let X, X» =, X, be a sample from P(A). Find a UMP unbiased size a test 
for the null hypothesis Hy: A < A, against alternatives A > A, by the methods of, 
this section. 
4 Let X ~ NB(1; 0). By the methods of this section find a UMP unbiased size a 
test of Ho : 0 = 0, against H, : 0 < Oo. К 


CHAPTER 10 


Some Further Results on 
Hypotheses Testing 


10.1 INTRODUCTION 


In this chapter we study some procedures commonly used in the theory of 
hypotheses: testing. In Section 2 ме: describe the classical procedure for 
constructing tests based on likelihood ratios, which will be used in Chapter 
12. Sections 3, 4, and 5 deal with some most frequently used tests of hypo- 
theses, and in Section 6 we look at the problem of testing hypotheses from 
a decision-theoretic viewpoint. 


10.2: THE LIKELIHOOD RATIO TESTS 


In Chapter 9 we saw that UMP tests do not exist for a wide variety of 
problems of hypotheses testing. In cases where UMP tests do exist, the 
methods apply only to special families of distributions. Moreover, some of 
the reductions suggested, such as invariance, do not apply to all families of 
distributions. н 

In this section we consider a procedure for Constructing tests that has some 
intuitive appeal and that frequently; though not necessarily, leads to UMP 
tests or UMP unbiased or invariant tests. Also, the procedure leads to tests 
that have some desirable large-sample properties. ; 

Let 0€ 0 © 2, be a vector of parameters, and let X be a random vector 
with pdf (pmf) fọ. Consider the problem of testing the null hypothesis 
Ho: X ~ fo, 0 € Ө, against the alternatives H,: X ~ fy, 8€ 6j. 


Definition 1. For testing Ho against Hj, a test of the form: reject Hy if 
í 435 g- 
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and only if A(x) < c, where c is some constant, and 


Sup о хәс, Xn) 
bw вар хь He)” 


is called a likelihood ratio test. 


The numerator of the likelihood ratio А is the best explanation of X (in 
the sense of maximum likelihood) that the null hypothesis Hy can provide, 
and the denominator is the best possible explanation of X. Hp is rejected if 
there is a much better explanation of X than the best one provided by Ho. 

It is clear that 0 € A < 1. The constant c is determined from the size 
restriction 1 


o az: Ax) < c) = a. 


If the distribution of A is continuous (that is, the df is absolutely continuous), 
any size a is attainable. If, however, A(X) is a discrete rv, it may not be 
possible to find'a likelihood ratio test whose size exactly equals а. This problem 
arises because:of the nonrandomized nature of the likelihood ratio test and 
can be handled by randomization. The following result holds. 


Theorem 1. If for given а, 0€ a x 1, nonrandomized Neyman-Pearson and 
likelihood ratio tests of a simple hypothesis against a simple alternative 
exist, they are equivalent. 


Proof. The proof is left as an exercise, 


"Theorem 2. For testing 0 € 6, against 6 € Ө, the likelihood ratio test is a 
function of every sufficient statistic for 6. 


Theorem 2 follows from the factorization theorem for sufficient statistics. 


Example 1. Let X ~ b(n, p), and we seck a level a likelihood ratio test:of 
Hips Po against Hy: р > ро: 


sup (7) "(1 ру" 


‚ Җх)= ‘sy Cra Eom А 


anra -or- Ge -3)- 
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The function p*(1 — р)” * first increases, then achieves its maximum at 
р = x/n, and finally decreases, so that 


Po (1 — ро" if p<, 
sup р(1—р)' * = z a-x 
x x se X 
25% i (2) (1 = ž) if PES 
It follows that ; 
р — por * if py <=, 
до) =) С = Gr, Tm 
1 if H X po. 


Note that A(x) < 1 for npọ < x'and A(x) — 1 if x < npo, and it follows that 
A(x) is a decreasing function of x. Thus A(x) < c if and only if x > c’, and 
the likelihood ratio test rejects Ho if x > с’. 

The likelihood ratio test is of the type obtained in Section 9.4 for families 
with am MLR except for the boundary A(x) = c. In other words, if the size of 
the test happens to be exactly a, the likelihood ratio test is a UMP level q 
test. Since X is a discrete rv, however, to obtain size æ may not be possible. 
We have ‘ 


a = sup P,{X > с} = Pp, {X> c'). 
PS bo 
If such a c' does not exist, we choose an integer c' such that 
P,{X>c}<a and Py{X>e'- 1} >a. 
Example 2. Consider the problem of testing и = цо against и э цо in sam- 
pling from (x, a°), where both and g* are unknown. In this- case 
Oo = {(H4; 02): o? > 0} and 0 = {(us 0°): — оо < u < co, 0? > 0). We write 
0 = (u, a?): 
1 X (хи ш) 
AO = зщ [ware ar |- 2277 i 
= fox), ui 3 
where 00 is the MLE, ô? = (1/n) х7 10, =). This ^ 
1 ЕТА 


eS ноа 


The MLE of 6 = (д, 0?) when both and 0° are unknown is (X хун, 
21 (x; — ¥)*/n). It follows that -i 0] 
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È (xi — ay. 
ырл) = sup ose eo {- I] 
1 


= Олту? 7 { б (one oy" e 


1 nid 
ГЕ mG | 
The likelihood ratio test rejects Ho if 


pe ВИМ 


Дх) < с, 


and since A(x) is a decreasing function of n(x — до) 7" (х, — X), we reject 
Hy 


Й 


A g 


ся uh 
ea om 


Ул@- po) |. or, 
Ss 


Where 5? = (n — 1) E? (x; я) The statistic 


that is, if 


(X) = 128 Ho) 


has а t-distribution with n — 1 d.f, Under Ho: p = ро, (X) has a central 
i(n — 1) distribution, but under Hy: p + ро (X) has a noncentral t-distri- 
bution with n — 1 d.f. and noncentrality parameter à —(u— шу)/т. We choose 
С" = t,-1,4/2 in accordance with the distribution of t(X) under Ho. Note 
that the two-sided t-test obtained here is UMP. unbiased (Theorem 9.5.4). 
Similarly one can obtain one-sided t-tests also as likelihood ratio tests. 


Example 3. Let Xi, X, ---, Х„ and Yı, Y; --, Y, be independent random 
samples from ⁄ (p1, 0;) and N (uz, 02), respectively. We wish to test the 
null hypothesis Hy: a; = 02 against Hi: 0? # 02. Here 


Ө = (n op po 02: — < p; < 0,0 > 0,1 = 1, 2} 


— 
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апа | 
Ө, = ((u 0% us 03): — © < шщ < 0, i = 1, 2, 0 = 02 > 0}. 
Let ө = (uy 0%, go, 02). Then the joint pdf is | 


A7 eras P [oa Becas Boi) 


(nym? oo хр { 20? 1 

Also, ; 

Zea 
207. 


log f(x, y) = — mit log 2x — 7 log 02 — 7 log 02 — 

= LB Oy aah 

20? a (Vi p). 
Differentiating with respect to ш; and иу, we obtain the MLE’ 

Ё =, =. 
Differentiating with respect to g? and оу, we obtain ће MLE's 
ee arene дин у Ren 
ü-«h0o-9. 4-.h0 9. 
If, however, 02 = 02 = o°, the MLE of c? is 


p Le; - x) + ўю; - ж) 


a m+n 
Thus 
Ж ) e mnm ‹ 
su х,у) = USA 
po [2л/(т + п)" (£e — x4 £o, -») n 
and T ? 
f ) eg mimm 
su х,у) = x "E, ET 
8 Окту Олту — 2)" (Bos - ЭЎ] › 
so that 


{ £o = sr (£o. m » 


A(x, y) en JG a s вво m 
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(£e - sr (£o, d yt 
(£o. -zf + £o. um mre 


" 
1 Bei stiBo- ay" о0о, y 
Writing 
Eoo tim» 
£o = и)" 
we have 


im pm (hs) rem) 


1 
(1000 Dfi- DIZY (1-60 – Dn- 1) 0/0)" 
We leave the reader to check that A(x, y) < c is equivalent to f < сү or 
f c. (Take logarithms, and use properties of convex functions. Alterna- 
tively, differentiate log A.) 
.. Under Hp, the statistic 


fac - Хут - 1) 
ЖҮ, - Fa -1) 


has an F(m — 1, n — 1) distribution, so that су, c; can be selected. It is usual 
to take 


Е.= 


Р(Е< су} = P{F > с} = © 


Under Hy, (02/0?) F has an F(m — 1, n — 1) distribution. 
We remark that the likelihood ratio test in this case coincides with the 
UMP unbiased test (Theorem 9.5.6). 


In certain situations the likelihood ratio test does not perform well. We 
reproduce here an example due to Stein and Rubin. 
Example 4. Let X be a discrete rv with pmf 
a ifx = +2, 


2 
P,-o{ X = x} = nda ifx=+1, 


a if x = 0, 
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under the null hypothesis Ну: р = 0, and 


pe ifx = —2, 
j 60-а) ifx = +1, 
` PAX = x} = Вк x 
(12) if x = 0, 
(1 —p)c if x = 2, 


under the alternative H,: рє (0, 1), where a and c are constants with 


O<a<4 and“ of cea. 


To test the simple null hypothesis against the composite alternative at the 
level of significance а, let us compute the likelihood ratio А. We have 
РХ = 2) «2 
Х=2 € 
E Pat } 
since a/2 < c. Similarly 4(—2) = a/(2c). Also 


XD2a-n»s 


XM 
7226 


А) = 


а <}, 


pa giiia 
= 91 = аа-а). 1-c' 


and 


(0). = га. 


The test rejects Hy if A(x) < К; where К is to be determined so that the 
level is œ. We see that 


Pfa) <=} = Р{Х= 22) = e 


provided that «/2c < [(1 — @)/(1 — c)]. But a/(2—a) < с < а implies 

оа < 2c —;са, so that-a — ca < 2c — 2ca, or a(1 — c) < 2c(1 = а), as 
required. Thus the likelihood ratio size æ test is to reject Ho if X = +2. The 
power of the likelihood ratio test is 


РДкх) < 1-2) = PAX = +2} =рс+(1—рс=с<а | 


for all рє (0, 1). The test is not unbiased and is even worse than the trivial 
test. p(x) = a. 
Another test that is better than the trivial test is to reject Ну whenever 
= 0 (this is opposite to what the likelihood ratio test says). Then 
PX = 0} =a, 


P,{X= TE (since c < a), 


D 
` 
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for all рє (0, 1), and the test is unbiased. 


We will use the likelihood ratio procedure quite frequently hereafter. We 
conclude this discussion with the following large-sample property of the 
likelihood ratio, which solves the problem of distribution. 


Theorem 3. Under some regularity conditions on f(x), the random variable 
—2 log A(X) is asymptotically distributed as a chi-square ry with degrees 
of freedom equal to the difference between the number: of independent 
parameters іп Ө and the number in Өр. 


We will not prove this result here; the reader is referred to Wilks [141], 
page 419. The regularity condtions are.essentially the ones associated with 
Theorem 8.7.4. In Example 2 the number of parameters unspecified under 
Hy is one (namely, g°), and under Н, two parameters are unspecified (и and 
a”), so that the asymptotic chi-square distribution will have 1 d.f. Similarly, 
in Example 3, the d.f. = 4 — 3 = 1. 


Example 5. In Example 2 we showed that, in sampling from a normal 
‚ Population with unknown mean џ and unknown variance g, the likelihood 
ratio for testing Ho: и = ро against Hy: u # pois 


Ax) = 1+ mG uf us 
Дег 3 
Thus 


2 
=2 log XX) = п log |1 + п шў. 
iQ; — X yf 
Under Hoy у A(X — шуа = 00, 1) and. X Qr, — XY |o? ~y%(n — 1). 
Moreover, X and 37 (X; - Xy are independent. It: follows. that 
A^ n(X — шу / LX; — Xy. [(n — 1)] has-a (central) t-distribution with 
(n — 1) d.£.; hence [n (n — 1) (€ — ш] 3X = Xy] is an F-statistic with 
(1, n — 1) d.f. Therefore we can write for the mgf of —2 log A(X) 
M,(t) = Eg, exp {—2t log АХ)) 
= Ey, exp {log (1 = Е 1 ) ) 
where F has an F(1, n — 1) distribution. Thus 


MO = (1+2 
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= Lv: T'(n|2) 
о Ги = NAG 
Pe ba) а t 


п 1 


Let us substitute у = 1/{1 + [f/(n — 1)])- Then 
ла а= 8710, 


n-i y 
We have 
UN foyeem. 
f f ( Н n= г) 47 
= (п 1^ f yeda (ү yy ду 
= (п – 1)!” к^ — nt, i) fort <1- a 
Thus B 
F ү“_ Da) Tl(n-Dp = n] Ad. 
Eg MORES Т) = Tim = DA Tn- n] fort ER 


As n— оо, we treat (a(n) + 1) as (а(п))!, and applying Stirling's approxi- 
mation (see P.2.8) to (a(m))!, we see that 


MO ^ c gm as n oo. 


It follows by the continuity theorem that —2 log A(X) L, Y, where. 
Y~ х). This result is consistent with Theorem 3; 


PROBLEMS. 10.2 


1. Prove Theorem 1. 
2. Prove Theorem 2. 


3. Find the likelihood ratio test of Hy: p = po against Hj: р # po, based on a 
sample of size 1 from b(n, p). 


4. Let X, X, +, X, be a sample from "(m 0%), where both 4 and 0? are 
unknown. Find the likelihood ratio test of Ho: с = оо against Hi: с # co. 3 


5. Let Xy, X» ---, X, Беа sample from pmf 
P(X =j) = 1. 1=12- М N = 1 isan integer 
(a) Find the likelihood ratio test of Hy: N <. No against H: N > No. 
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(b) Find the likelihood ratio test of Hj: N = N, against H,: N # No. 
6. Fora sample of size 1 from pdf 


fie) = 2 (0 = х), 0<x<8, 


find the likelihood ratio test of б = 0, against 0 # [^ 

7. Let Xy, Хь, ---, X, be a sample from G(1, [2 

(a) Find the likelihood ratio test of 8 = 8, against В + Bo. 

(b) Find the likelihood ratio test of В < Bp against 8 > fo: 

8. Let (Xy, Y), (X, Ys) ++) (Xp Y,) be a random sample from a bivariate 
normal population with EX; — Hy EY; = дь var (X;) = 0%, var (Y;) = o?, and 
cow(X;, Y;) = ро?. Show that the likelihood ratio test of the null hypothesis Hy: p = 0 
against Н,:р # 0 reduces to rejecting Hp if |R| > c, where R = 281,/(S,2 + S), 
Sy, Sy, and S3? being the sample covariance and the sample variances, respectively. 
(For the pdf of the test statistic R, see Problem 7.6.1.) 


10.3 THE CHI-SQUARE TESTS 


In this section we consider some tests based on the chi-square statistic for 
Some parametric hypotheses. The types of problems discussed here arise quite 
often in practice. Chi-square tests are also used for)testing some nonpara- 
metric hypotheses and will be taken up again in Chapter 13. 

We first consider some tests concerning variances. Given a sample of size 
n from a normal population MN (ty 0°), where о? is unknown, we need to 
test a hypothesis of the type о? > of, g? « g2, or o^ = 62, where 00 is 
Some given positive number. In the following table we summarize the tests. 


s Reject Hp at level æ if 
H- Hi и Known . р Unknown 


2 
L azo о<о Diep sy. sz PES et, 
2 


П. exo o0, Eu — uy > 72.0 5 > жш 2 


iG: - Ws у у apo [#< 
Ш. ¢ =a € #00 ог or 


Z 


Zi- > „л [se п TE ад 


Remark І. All these tests can be derived by the standard likelihood ratio 
Procedure. From Theorem 9.5.3 we see that, if д is unknown, tests I and 
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II are UMP unbiased (and UMP invariant). In fact, test II is UMP (see 


. Lehmann [70], 95). If н is known, tests I and II are UMP (see Example 


9.4.5). For tests III we have chosen constants су, с; so that each tail has 
probability a/2. This is the customary procedure, even though it destroys 
the unbiasedness property of the tests, at least for small samples. 


Example 1. A manufacturer claims that the lifetime of a certain brand of 
batteries produced by his factory has a variance of 5000 (hours)". A sample 
of size 26 has a variance of 7200 (hours). Assuming that it is reasonable 
to treat these data as a random sample from a normal population, let 
us test the manufacturer's claim at the а = .02 level. Here Ну: о? = 5000 
is to be tested against Hj; o^ 4 5000. We reject Ho if either 

2 


2 
22205 а; 2 2 [4 «3 
s“ = 7200 < " 2 a E A Ou Se See 2 ] ^x 


We have 


2 
HAPS 


2 — 5000 
n = 1 ¥*-1.1-a/2 25 


2 
% = 3000 , 44314 = 8862.8 


2 
п 1 X»n-ran 25 


х 11.524 = 2304.8 


Since s? is neither < 2304.8 nor > 8862.8, we cannot reject Hy at level .02. 


A test based on a chi-square statistic is also used for testing the equality 
of several proportions. Let Xj, X,++-, X, be independent rv's with 


~ Xi Wn, p), i = 1, 2, +, k, k > 2. 


Theorem 1. The rv D*_, {(X; — п;р)/5/ [ир 1 — p]? converges in distri- 
bution to the у (К) rv as т, Mz) +++, m > 00. per 


Proof. The proof is left as an exercise. 


If m, nz, ---, n, are large, we can use Theorem ltotest Ho: ру = pa — 77 
7 p, —p against all alternatives. If p is known, we compute 
_ А X np y 
чы Журт fai à 


and if y > у, we reject Ho. In practice p will be unknown. Let p = (Py, 


Pas 773). Then the likelihood function is. 


Цр; хь з, хь) = i (era ^_^ py 
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so that 
А & k 
log Lp; x) = slog (P) 3 xilog pi 3 (л, — x) log (= рд. 
t il D = i= 


The MLE f of p, under Hg, is therefore given by 


that is, 


Under certain regularity assumptions (see Cramér [18], 426-427) it can be 
shown that the statistic 


А O; = nip) 
yg Hes wih тр 
a У) 
is asymptotically y°(k — 1). Thus the test rejects Ho: ру = P2=-- = рь = р, 


p unknown, at level a if у > X; i à 
It should be remembered that the tests based on Theorem 1 are all large- 
sample tests and hence not exact, in contrast to the tests concerning the 
variance discussed above, which are all exact tests. In the case k — 1, UMP 
“tests of p > po and p x po exist and can be obtained by the MLR method 
described in Section 9.4. For testing p — po, the usual test is UMP unbiased. 
In the case k — 2, it is possible to find exact UMP unbiased tests of 
Pi S Po Pi = po, and p; # p» But the tests are based on the conditional 
distribution of X;, given X; + Xz, and are too complicated to describe here. 
We refer the reader to Lehmann [70], page 143. If п; and л» are large, a test 


based on the normal distribution can be used instead of Theorem 1. In this 
case the statistic 


a Хут — Хп; 
e 7 7 УКГ By im + лд” 
where й = (Xy + X2) [ (nj + m)» is asymptotically (0, 1) under Ho: 
Pı = P: = p. If p is known, one uses p instead of f. It is not too difficult to 
show that 22 is equal to Yj, so that the two tests are equivalent. 

In applications a wide variety of problems can be reduced to the multi- 
nomial distribution model. We therefore consider the problem of testing the 
parameters of a multinomial distribution. Let (Ху, Хэ, ---, X, .,) be a sample 
from a multinomial distribution with parameters л, ру, pz, “s p,..;, and let us 

‚ Write X, —n— X, — -e — Xp- and p, 21 — pj +: — ру: The difference 
between the model of Theorem 1 and the multinomial model is the independ- 
ence of the X;'s. Ы 
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Theorem 2. Let (Xp Xs; +--, у) be a multinomial rv with parameters 
п; р, Po ***> Pii; Then the rv 


Š С; - тў) 
3 0, = Ao I PED. 
@) =È [gos 
is asymptotically distributed as a x — 1) гу (as л — оо). 
Proof. For the general proof we refer the reader to Cramér [18], pages 


417-419. We will consider here the К = 2 case.to make the result a little 
more plausible. We have 


и, = GG mpi)? ү, OG = пра)? _ O6 = nay ү ee i п(1 — por 


пру пр; пру - p) 
= 06 - npy ia + ar ay] 
00 my 
пр(1= ру) ` 


It follows from Theorem 1 that U; 5. Y as n > co, where Y ~ (1). 


To use Theorem 2.to test Ho: pj = Pitta Pe = Pe We need only to 


compute the quantity ; 
ag (i m 
ү E пр; 
from the sample; if » is large, we reject Hg if и > Xii & 


Example 2. А die is rolled 120 times with the following results: 
1x:2-$3r24 715856 
Frequency: 20 30 20 25 15 10 ^ 
Let us test the hypothesis that the die is fair at level a = .05. The null 
hypothesis is Ho: p; = +, i = 1,2, ---, 6, where p, is the probability that the 
face value is i, 1 < i < 6. By Theorem 2 we reject Hp if. 


ix, — 120¢ 
K-da his á т" ў EL TAa 
We have 
: 10° M О 596 


Ѕіпсе %, 05 = 11.07, we reject Но. Note that, if we choose а = .025, then 
Жов = 12.8, апа we cannot’ reject at this levet 


ў 


Theorem 2 has much wider applicability, and we will later sty its 
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application to contingency tables. Here we consider the application of 
Theorem 2 to testing the null hypothesis that the df of an rv .X has a specified 
form. 


Theorem 3. Let X;, X», +, X, be a random sample on X. Also, let Ho: X ~ F, 
where the functional form of the df F is known completely. Consider a 
collection of disjoint Borel sets Aj, Az, =, A, that form a partition of the 
real line. Let P(X€.4;) = р i = 1, 2, =. k; and assume that p; »:0 for 
each i. Let Y; = number of X/'s in А, j = 1, 2, +, k, i = 1, 2, --, n. Then 
the joint distribution of (У), Ya, ---, Y,-1) is multinomial with parameters 
Mt, Dis Pos э Dia. Clearly Y, =n = Yai and pj21— pi ~ 
= Pea 


The proof of Theorem 3 is obvious. One frequently selects A), A>, ·-:, A, 
as disjoint intervals. Theorem 3 is especially useful when one or more of the 
parameters associated with the df F are unknown. In that case the following 
result is useful. 


Theorem 4. Let Hy: X ~ Fy, where 6 = (6), 02, =+, 0,) is unknown, Let 
Xy X» =, X, be independent observations on X, and suppose that the 
МГЕ» of 6), 0, ---, 0, exist and are, respectively, Ôn 62, --,6,. Let 
Aj, Ap, «++, A, be a collection of disjoint Borel sets that cover the real line, 
and let 


By = PXE Aj} »0 ^ i212, ej 

where 6 = (б, ---, 6,), and P, is the probability distribution associated 
with Fo. Let Y, Y», ---, Y, be the гуз, defined as follows: Y; = number 
of Xy; Xj, ++, X, in As i— 1,2, A, kl 

Then the rv 

2cb[Q, ny 
С Pr nj; } 

is asymptotically distributed as а y'(k — 7 — 1) rv (as n > ©). 

Тһе proof of Theorem 4 and some regularity conditions required on Fy 
are given in Cramér [18], page 426. 


To test Hy: X ~ F, where F is completely specified, we reject mif 


A ро zen sy. 


provided that n is sufficiently large. If the null hypothesis is Hy: X ~ Fo 


sed known except for the parameter 6, we use Theorem 4 and reject 
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r= (0 dd Ioui iw 


where r is the number of parameters estimated. 


Example 3. The following data were obtained from a table of random 
numbers of normal distribution with mean 0 and variance 1. 


0.464 0.137 2.455 —0.323 —0.068 
0.906 —0.513. —0.525 0.595 0.881 
—0.482 1.678 —0.057 —1.228 —0.486 
—1.787 —0.261 1.237 1.046 —0.508 


We want to test the null hypothesis that the df F from which the data came 
is normal with mean 0 and variance 1. Here F is completely specified. Let 
us choose three intervals(— оо, — .5], (— .5,.5] and (.5, со). We see that 
Үү = 5, Y; = 8, and Y; = 7. 

Also, if Z is (0, 1), then p; = .3085, pz = .3830, and рз = .3085. Thus 


Jl (у; = np) 
A 
— (5 — 20 x .3085) 4 B= 20 x 383) 4 (= 20 x 3085} 
2 617 7.66 6. 
<i. { 


Also, X7 „= 5.99, so that we accept Hp at level .05. 


Example 4. In a 72-hour period on a long holiday weekend there was a 
total of 306 fatal automobile accidents. The data are as follows: ` 


Number of Fatal Accidents Numbers of Hours 
per Hour 
Oorl 4 
2 10 
3 15 
4 12 
5 12 
6 6 
1: 5 
8 ог тоге 7 


Let us test the hypothesis that the number of accidents per hour is a 
Poisson rv. 
Since the mean of the Poisson rv is not given, we estimate it by 


Е 306 
A= = 7 = 425. 


^ 
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Let us now estimate f; = РИХ = i}, i = 0,1, 2, =, Bo = еі = .0143. Note 
that 


РАХ ях +1}. i 
Р{Х=х} x+?’ 


so that 9, = [A/(i + 1)] Êi. Thus 


Êi = .0606, fz = .1288, Ps = .1825, f; = .1939, 
Ês = 11648, ® = .1167, р; = .0709, fa = 1 — .9325 = .0675. 


The observed and expected frequencies are as follows: 


i: Oorl Rintra 4 5 6 7 8ormore 


Observed frequency, 0,: 4 10 15 12: 5425 пол dw 7: 
Expected frequency 


= 72р, = е: 5.38 9.28 13.14 13.96 11.87 8.41 5.10 4.86 


Thus 

u= 5 -eu 
Since we estimated one parameter, the number of degrees of freedom is 
RS = 8 —1-—126.From Table з on page 652, X? „ = 12.6, 


and since 2.58 — 12.6 we accept the null hypothesis. 


Remark 2. Any application of Theorem 3 or 4 requires that we choose sets 
Aj, Ao, +++, Ap and frequently these are chosen to be disjoint intervals, As 


the probability Р{Хє 4; under Ho is approximately 1/k. Moreover, it is 
desirable to have n/k > 5 or, rather, e; 2 5 for each i. If. any of the e,’s is 


PROBLEMS 10.3 


1. The standard deviation of capacity for batteries of a standard type is known 
to be 1.66 ampere-hours. The following capacities (ampere-hours) were recorded 
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2. A manufacturer recorded the cut-off bias (volts) of a sample of 10 tubes as 
follows: 12.1, 12.3, 11.8, 12.0, 12.4, 12.0, 12.1, 11.9, 12.2, 12.2. The variability 
of cut-off bias for tubes of a standard type as measured by the standand deviation 
is .208 volts. Is the variability of the new tube with respect to cut-off bias less than 
that of the standard type? (Natrella [84], 4-5) 


3, (Approximately) equal numbers of four different types of meters are in service 
and all types are believed to be equally likely to break down. The actual numbers 
of breakdowns reported are as follows: 


Type of meter: 1 2 3 4 
Number of breakdowns reported: 30 40 33 47 


Is there evidence to conclude that the chances of failure of the four types are not 
equal? (Natrella [84], 9-4) 


4. Every clinical thermometer is classified into one of four categories, A,B,C, D, 
on the basis of inspection and test. From past experience it is known that ther- 
meters produced by a certain manufacturer are distributed among the four categories 
in the following proportions: 


Category: A B С р 
Proportion: .87 .09 .03 ‚01 


A new lot of 1336 thermometers is submitted by the manufacturer for inspection 
and test, and the following distribution into the four categories results: 


Category: A B Cc D 

Number of thermometers reported: 1188 91 47 10 * 
Does this new lot of thermometers differ from the previous experience with regard 
to proportion of thermometers in each category? (Natrella [84], 9-2) 


5. A computer program is written to generate random numbers, X, uniformly in 
the interval 0 < X < 10. From 250 consecutive values the following data are 
obtained: 


X value: 0-1[99 — 2-3.99 = 4-5.99 . 6-7.99  8-9.99 
Frequency: 38 55 54 41 62 


Do these data offer any evidence that the program is not written properly? 


6. A machine working correctly cuts pieces of wire to a mean length of 10.5cm 
with a standard deviation of 0. 15cm. Sixteen samples of wire were drawn at random 
from a production batch and measured with the following results (centimeters): 
10.4, 10.6, 10.1, 10.3, 10,2, 10.9, 10.5, 10.8, 10.6, 10,5, 10.7, 10,2, 10.7, 10.3, 
10.4, 10.5. Test the hypothesis that the machine is working correctly. | 
7. An experiment consists in tossing a coin until the first head shows up. One 
hundred repetitions of this experiment are performed, The frequency: distribution 
of the number of trials required for the first head is as follows: 


Number of trials: 1 2 зо 5 ог тоге . 
Frequency: 40 32 15 7 6 
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Can we conclude that the coin is fair? 
8. Fit a binomial distribution to the following data: 


x: 0 1 2 3 4 
Frequency: 8 46 55 40 1 


9. Prove Theorem 1. 


10.4 THE t-TESTS 


In this section we investigate one of the most frequently used types of tests 
in statistics, the tests based on a t-statistic. Let Xy Х, +--+, X, be a random 
sample from (р, c?), and, as usual, let us write 


Pret PH 2g л (Хх; - xy. 


In the following table we summarize some of the results we have already 
encountered. 


а? Known а? Unknown 
Reject Hp at level а if 


Hy Н! 


lusu р> #2 ш +e 2 Mot tee 


Whe д с y< po o XX tas X Xu VT tdi qug 
M. p= uo ux [#— | = Va Ix - | = Ушта 


Remark 1. We recall that a test based on a t-statistic is called a t-test. The 
t-tests in I and П are called one-tailed tests; the t-test in III, a two-tailed test. 


Remark 2. If o° is known, tests I and П are UMP and test III is UMP 
unbiased. If о? is unknown, the t-tests are UMP unbiased and UMP invariant. 
All these tests can be derived by the usual likelihood ratio procedure. 


Remark 3. If nis large (> 30), we may use normal tables instead of t-tables. 
The assumption of normality may also be dropped because of the central 
imit theorem. For small samples care is required in applying the proper 
test, since the tail probabilities under normal distribution and t-distribution 
differ significantly for small n. (See Remark 7.4.3). 


Example 1. . Nine determinations of Copper in a certain solution yielded a 

` sample mean of 8.3 percent with a standard deviation of .025 percent. 
Let u be the mean of the Population of such determinations. Let us test 
Hy: u = 8.42 against Hy: и < 8.42 at level a = .05. 
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Here n = 9, x = 8,3, = .025, po = 8.42, and t,-1,1-¢ = — tg, 6 = — 1.860. 


Thus 


5 " jt 25 
po dies = 842 — 029 1.86 


= 8.4045. 
We reject Hp since 8.3 < 8.4045. 
We next consider the two-sample case. Let X}, X», --- ‚ X, and у, Yo, «++, Y, 


be independent random samples from J//(uj, а?) and M(t 0?), respectively, 
Let us write 


f= тіЎ Хх, Yor} Y, 


=m- ÈT, s-m-y'fa-v. 
and 
2 (m-1s + (п – 1) 52, 
m+n—2 


5 is sometimes called the pooled sample variance. The following table 
summarizes the two sample tests comparing 44 and pp: 


Hy н! 050; Known 05, о? Unknown, 7,2» 
(6 = known constant) Reject Hp at level a if 
Lu-su-um-0x-yz Ade Ot bent 
2 1 1 
(ete a y ed ee 
б + Za, a + а К mi e 
IL, — 2 26 дї <ð #- 95 X—506-—luws 


2-а +2 зм + 


Ш. a- 2 =5 m- mò |t- ar. |2 = — 8) E tmina 
wwe fe 


Remark 4. The case of most interest is that in which à — 0. 1f оъ 0; аге 
unknown and a= о? = o^, с? unknown, then 55 is an unbiased estimate 
of c?. In this case all” the two-sample t-tests are UMP unbiased and UMP 
invariant. Before applying the t-test, one should first make sure that 
0 = = о? = o°, д? unknown. This means applying another test on the data. 
We will сове this test in the next section. 


Remark 5. If т + п is large, we use normal tables; if both т and n ure 
е ооо а gry ш bps 
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Remark 6. The problem of equality of means in sampling from several 
populations will be considered in Chapter 12. 


Example 2. The mean life of a sample of 9 light bulbs was observed to be 
1309 hours with a standard deviation of 420 hours. A second sample of 
16 bulbs chosen from a different batch showed a mean life of 1205 hours 
with a standard deviation of 390 hours. Let us test to see whether there 
is a significant difference between the means of the two batches, assuming 
that the population variances are the same (see also Example 10.5.1). 


Here Ho: uj = fa, Hy: ду * р, m = 9, n = 16, Хх = 1309, 5 = 420, p = 
1205, s; = 390, and let us take а =.05. We have 


sp = , 8420)" + 15(390)" 


{ 23 
so that 
een Sp la * т F ts osia #@20/ + 15390)" nee zt nu 
= 2.069 x 4/ 16055217 x б, = 345.44, 


Since |x — J| = |1309 — 1205| = 104 + 345.44, we cannot reject Hp at level 
a = .05. ^ 


Quite frequently one samples from а bivariate normal population with 
means д, i, variances a}, о>, and correlation coefficient p, the hypothesis 
of interest being шу = uz. Let (Xi, Y1), (Xo, Yo),-*-, (X,, У,) be a sample from 
a bivariate normal distribution with parameters ш, 4, 0%, 0%, and p. Then 
X, — Yjis Л — pe, 0”), where o? = ato- 200102. We can therefore 
treat D; = (X, — Y), j = 1, 2, +», n, as a sample from a normal population 
(see Problem 7.6.5). Let us write 


Dp 4, — dy 
gu qd P a que 
n * п-1 


The following table summarizes the resulting tests: 


Hy Н, 
(4 = known constant) Reject Ho at level a if 
1. и = p > dy а= < do 4<ау+ 276. 1150. 
"JL 4j — 4 x dy ti — tia 7 dy 224+ 74 ted. 
x ; 


HL m — ud Mn # dy [Z= 4| > Fe trea 
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Remark 7. The case of most importance is that in which dy = 0, Clearly 
all the t-tests are UMP unbiased and UMP invariant (see Remark 2). If a 
is known, one can base the test on a standardized normal rv, but in practice 
such an assumption is quite unrealistic. If is large (> 30), one can replace 
t-values by the corresponding critical values under the normal distribution. 


Remark 8. Clearly, it is not necessary to assume that (Xj, Y1), --:, (Xm Yn) 
is a sample from a bivariate normal population. It suffices to assume that 
the differences D; form a sample from a normal population. ? 


Example 3. Nine adults agreed to test the efficacy of a new diet program. 
Their weights (pounds) were measured before and. after the program and 
found to be as follows: 


l «5,24 Haai но sienta ЭВМ 
Before: 132 139 126 114 122 132 142 119 126 
After: 124 14] 118 116 114.132 145 123. 121 


Let us test the null hypothesis that the diet is not effective, Ho: ду — шо = 0, 
against the alternative, Hi: 4; — 4 > 0, that it is effective at level а = .01. 
We compute 
d.85—2:8-—24.1840-531.4t5 "18 2502] 
9 9 
= {(8 — 2) +.(=2,-2) +@-—2у 23s 2 THEE TAM 
+ (72) + (7-3 -2Y + (-4 -2 + (5 – 278 


z 26.15, 
pm 
. Thus 
do + Th byte = 0 + Se A s M x 2,896 
n v 9 


= 4.99. 
Since d % 4.99, we accept hypothesis Hp that the diet is not very effective. 


PROBLEMS 10.4 


1. The manufacturer of a certain subcompact car claims that the average mileage 
of this model is 30 miles per gallon of regular gasoline. For nine cars of this model 
driven in an identical manner, using 1 gallon of regular gasoline, the mean distance 
traveled was 26 miles with a standard deviation of 2.8 miles. Test the manufac- 
turer's claim if you are willing to reject a true claim no more than twice in 100. 


2. The nicotine contents of five cigarettes of a certain brand showed a mean of 


456 FURTHER TOPICS IN HYPOTHESES TESTING 


21.2 milligrams with a standard deviation of 2.05 milligram. Test the hypothesis 
that the average nicotine content of this brand of cigarettes does not exceed 19.7 
milligrams. Use a = .05. 


3. The additional hours of sleep gained by eight patients in an experiment with a 
certain drug were recorded as follows: 


Patient: 1 2 3 4 5 6 7 8 
. Hours gained: А В 205 De 73:0 


Assuming that these patients form a random sample from a population of such 
patients and that the number of additional hours gained from the drug is a normal 
random variable, test the hypothesis that the drug has no effect at level œ = .10. 


4. The mean life of a sample of 8 light bulbs was found to be 1432 hours with 
a standard deviation of 436 hours. A second sample of 19 bulbs chosen from a 
different batch produced a mean life of 1310 hours with a standard deviation of 
382 hours. Making apprópriate assumptions, test the hypothesis that the two 
samples came from the same population of light bulbs at level a = .05. 


5$. A sample of 25 observations has a mean of 57.6 and a variance of 1.8. A 
further sample of 20 values has a mean of 55.4 and a variance of 2.5. Test the 
"hypothesis that the two samples came from the same normal population, 


6. Two methods were used in a study of the latent heat of fusion of ice. Both 
method 4 and method B were conducted with the specimens cooled to —0.72°С. 
The following data represent the change in total heat from —0.72°C to water, 0°C, 
in calories per gram of mass: 


Method А: 79.98, 80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 80.03, 
1 80.02, 80.00, 80.02 
Method B: 80.02, 79.94, 79.98, 79.97, 79.97, 80.03, 79.95, 79.97 


Perform a test at level .05 to see whether the two methods differ with regard to 
their avarage performance. (Natrella [84], 3-23) 


7. In Problem 6, if it is known from past experience that the standard deviations 
of the two methods are 74 = .024 and ов = .033, test the hypothesis that the 
methods are same with regard to their avarage performance at level œ = .05. 


8. During World War II bacterial polysaccharides were investigated as blood 
plasma extenders, Sixteen samples of hydrolized polysaccharides supplied by 
Various manufacturers in order to assess two chemical methods for determining 
the average molecular weight yielded the following results: 


Method А: 62,700; 29, 100; 44,400; 47,800; 36,300; 40,000; 43,400; 35,800; 
Method B: 56,400; 27,500; 42,200; 46,800; 33,300; 37,100; 37,300; 36,200; 
77+ 35,200; 38,000; 32,200; 27,300; 36,100; 43,100; 38,400; 39,900 
Perform an appropriate test of the hypothesis that the two averages are the same 
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against a one-sided alternative that the average of Method A exceeds that of 
Method B. Use а = .05. (Natrella [84], 3-38) 


9. The following grade-point averages were collected over a period of 7 years to 
determine whether membership in a fraternity is beneficial or detrimental to grades: 


Year: 1 2 3 4 5 6 7 
Fraternity: 24. 2.0 2.3 21 2л 2.0 2.0 
Nonfraternity: 2.4 2.2 2.5 2.4 2.3 1.8 1.9 


Assuming that the populations were normal, test at the .025 level of significance 
whether membership in a fraternity is detrimental to grades. 


10.5 THE F-TESTS 


The term F-tests refers to tests based on an F-statistic. Let Xj, Xz, ---, X, and 
Yi, Y, «+, Y, be independent samples from J/(u5, 02) and N (uz, 05), respec- 
tively. We recall that DX; — Xy]o? ~ ym — 1) and XY; — Р)/02 
~ Хп — 1) are independent rv's, so that the rv 


EG ы-у мз 


F(X, Y) = 
COT Eacxy am) WS 
1 


is distributed as F(m — 1, n — 1). 
The following table summarizes the F-tests: 
Lig Known дф Unknown 
Reject Ho at level а if 


$6, — uY 
d4———— A Fema uz Fa; я-1,а 
2 


Ho Н, 


Eo - а) 
А Eo А E 


nma E. 2 Е,-1,.-1,а 
1 


> 
Ў, (к-да " 


EG;- а) 
pu DEA z A Е, п,а/2 >к. п-1,а/2 
Ёо: - п) ifs > 12 


Ш. o = 02 сї#@ or or 
X (= uy 52 
tia 2 T Fima ER art 
Ё б-м) їз < 32 
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Remark 1. Тһе tests are based on the right-hand tail of the F-distribution. 
To do so one uses the result (see Remark 7.4.6) 


Кылса = (Fossa) 

Remark 2. We have seen that the tests described can be easily obtained 
from the likelihood ratio procedure. Moreover, in the important case where 
йһ #2 are unknown, tests I and II are UMP. unbiased and UMP invariant. 
For test III we have chosen equal tails, as is customarily done for conven- 
ience even though the unbiasedness Property of the test is thereby destroyed. 


Example 1 (Example 10.4.2 continued). In Example 10.4.2 let us test the 
validity of the assumption on which the t-test was based, namely, that the 
two populations have the same variance at level .05. Since s; < 5j, we compute 
1089 = (420/390): = 196/169 = 1.16. Since ЕЛАТЕ КУ oj = 3.20, 
We cannot reject Ну: с = op. 


An important application of the F-test involves the case where one is 
testing the equality of means of two normal populations under the assump- 
tion that the variances are the same, that is, testing whether the two samples 
come from the same population. Let X» Xp, +++, X, and ү, Yo, ..., Y, be 
independent samples from (i, ГА) and (р, оў), respectively. If 
01 = 6; but is unknown, the t-test rejects Ho: u= pu, if 17| >c, where. c is 
selected so that a, = P(|T| > са = дь 71 = 02), that is, c = f, 
Spy (l/m + 1/n), where 


з (т 1) з (и 1) 52 
x mtn-2 


m*n-2,a2/2 


, 


5p Sz being the sample variances. If first an F-test is performed to test 
9; = 0, and then a t-test to test д = o at levels а and ap, respectively, 
the probability of accepting both hypotheses when they are true is 

Р{|Т| 4c « Е<с;| m = uo = 02); 


and if Е is independent of T, this probability is (1 — a3) (1 = az). It follows 


that the combined test has a significance level «—-1-(1-ap(1— а). 
We see that 


€ = 01 + аз — diaz x a + оз 
and а > max (aj, az). In fact, o will be closer to @ + az, Since for small а 
and az, апаз will be closer to 0. 
We show that F is independent of T whenever 7; = 02. The statistic 
Vx Y, Xr = Py +. Xi, - Y) is a complete sufficient statistic for 
the parameter (Ay fa 01 = оз) (see Theorem 8.3.4). Since the distribution 
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of F does not depend оп ду, 42 and a; = оз, it follows (Problem 5) that F 
is independent of V whenever 7; —0;. But T is a function of V alone, so that 
F must be independent of T also. 

In Example 1, the combined test has a significance level of 


a —1—(95)(.95) = 1 — .9025 = .0975. 


PROBLEMS 10.5 


1. For the data of Problem 10.4.4 is the assumption of equality of variances, on 
which the t-test is based, valid? 


2. Answer the same question for Problems 10.4.5 and 10.4.6. 

3. The performance of each of two different dive bombing methods is measured 
a dozen times. The sample variances for the two methods are computed to be 5545 
and 4073, respectively. Do the two methods differ in variability? 

4. In Problem 3 does the variability of the first method exceed that of the second 
method? 

5. Let X = (X, X» =, X,) be a random sample from a distribution with pdf 
(pmf) f(x, 8), 0 € Ө, where Ө is an interval іп 2,. Let 7(X) be a complete sufficient 
statistic for the family (/(x;0):0€ Ө). If U(X) is a statistic (not a function of T 
alone) whose distribution does not depend on 6, show that U is independent of T. 


10.6 BAYES AND MINIMAX PROCEDURES 


Let Xi, X, +++, X, be a sample from a probability distribution with pdf (pmf) 
Je, 0 € O. In Section 8.8 we described the general decision problem, namely, 
once the statistician observes x, he has a set «/ of options available. 
The problem is to find a decision function d that minimizes the risk 
R(0, d) = E,L(6, d)in some sense. Thus а minimax solution requires the 
minimization of max R(6,d), while a Bayes solution requires the minimization 
of R(z, d) = ER(6, d), where л is the a priori distribution on Ө. In Remark 
9.2.2 we considered the problem of hypothesis testing as a special case of 
the general decision problem. The set Z contains two points, ау and aj; ао 
corresponds to the acceptance of Ho: 0 € Oo, and a, corresponds to the rejec- 
tion of Ho. Suppose that the loss function is defined by 


І (0, a) = «(0) 10є0), 4(0) > 0, 

a fro, a)=b0)  if0e80» b0)> 0, 
) |E@ а = 0 if 0 € Oo 
3 L (0, а) = 0 if 0 € 6). 
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Then 
(2) R(0, d(X)) = L(0 ao) P(d(X) = a} + L(0, a1) Po{d(X) = a1) 
(3) "ui i» Po{d(X) = ao} if 0€ 6; 

b(0) PÍd(X) = aj) if 0 Op. 


A minimax solution to the problem of testing Hy: 0 € Oy against H; : 0 € Өү, 
where Ө = Oo + Ө), is to find a rule that minimizes 


max [«(0) Р,{4(Х) = ao), b(0) Pofd(X) = ay}). 


We will consider here only the special case of testing Ho: 0 = 0, against 
Ну: 0 = 0. In that case we want to find a rule d which minimizes 


(4) max [a P, (d(X) = a}, b Py, {d(X) = а,|]. 
We will show that the solution is to reject Ho if 
Sos) 
5 LAS У 
(5) A9 2 k 
provided that the constant k is chosen so that 
© R(0o, &Х)) = R(0s, d()), 


where d is the rule defined in (5); that is, the minimax rule d is obtained if 
we choose К in (5) so that - 


(7) аР, (40) = а) = b Р„{4(Х)у = ay}, 
or, equivalently, we choose К so that 


(8) a P [AD < k} 2 Zo à E 


Let d* be any other rule. If A(fo, d) < R(Oo, d*), then R(05; d) = R(0;, d) 
< max [R(fo, d*), К(0,, 4*)] and d* cannot be minimax, Thus R(0o, d) 
= R(0y d*), which means that - ч 


(9) Py {d*(X) = а) < Р, (d(X) = a) = P(Reject Нн true}. 


By the Neyman-Pearson lemma, rule d is the most powerful ofits size, so ` 
that its power must be at least that of d*, that is, 


P {dX = а} = Р) 2 а) 
so that 
Pi,{d(X) = ao} < Py,{d*(X) = ap}. 
It follows that 
a Pa {d(X) = ao} < a P, (а(х) = ay} 
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and hence that 
(10) R(0;, d) < R(0;, d*). 
This means that А 
тах [К(б d), R(0y, 4)] = R(O,, d) < R(0;, d*) 
and thus 
тах [R(0o, d), R(0;, d)] < max [R(0o, d*), R(;, d*)]. 


Note that in the discrete case one may need some randomization procedure 
in order to achieve equality in (8). 


Example 1. Let Xj, Xj, ·-., X, be iid (р, 1) туз. To test Ho: н = ш 
against Hy: и = дү (> шу), we should choose k so that (8) is satisfied. 
This is the same as choosing с, and thus К, so that 

аР„{ЙЁ< c) = b Py, (X > c) 
or 


X-u Sim фал: а X= шу с-ш 
т S ys) PE > eh 
Thus . 
40 [у n(c — m) = Ь{1—Ф[/п(с— uon» 
where Ф is the df of an J/(0, 1) rv. This can easily be accomplished with 
the help of normal tables once we know a, b, до fay and n. 


We next consider the problem of testing Ho: 0 € Ө, against ш: 0 € O; from 
a Bayesian point of view. Let z(0) be the a priori probability distribution on 
Ө. Then 1 


Кт, d) = EyR(0,d (X) 
f. R(0,d)z(0)d0 if zis a pdf, 
D R(0, d) z(0) if z is a pmf, 


f, HO) TOPKA) = а} di +f аб) x0) PAA) = ас) 


(11) - if x is a pdf, 
í 00) z(0) Р(Х) = а} + Zal) л(@) Po{d(X) = ao} 
i if z is a pmf. 
The Bayes solution is a decision rule that minimizes R(x, d). In what 


follows we restrict our attention to the case where both Ho and H, have 
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exactly one point each, that-is, Ө, = {бо}, Ө = (61). Let x(@) = To and 
z(01) = l ло = z,. Then РА 

(12) R(z,d) = b ToP, {d(X) = a1) + a тР,{4(0Х) = а), 

where Б(бу) = b, a(0;) = а; (a, b > 0). 


Theorem 1. Let X = (Xj, X5, ---, Х„) be an rv of the discrete (continuous) 
type with pmf (pdf) fj, бєӨ = {б 01). Let (00) = zo, 2(0;) = 1 — по = ті 
be the a priori probability mass function on Ө. A Bayes solution for testing 
Ho: X ~ fy, against Ну: X ~ fap using the loss function (1), is to reject Ho if 
13 Ae) > bro 
(13) Tx) 3 = 
Proof. We wish to find d which minimizes 
R(x, d) = bro P4(4(X) = а) + ams, {d(X) = ap}. 
Now 
R(x, d) = Е, R(0, d) 
= E(E L0, d) |IX)) 

во it suffices to minimize Ej(L(0, d) | X). 

The a posteriori distribution of 0 is given by 


_ (6) fi) 
MO D = EO 20) 
e 19) fx 
ofa (X) + = fo(X) 


| To fa(X) #0 
4) © ЛЮК Юлы ЖА", 
Ti fo (X) if 29. 
m tx) Mh 

Thus 
х), 0 = 0 d(X) = а, 
0х), 9 = 6, d(X) = a. 
It follows that we reject Ho, that is, d(X) ~ a, if 

b (x) < a 0х) 
which is the case if and only if 


b to falx) < amf,(~, 


ЕДІ, d(X))| X = x} = E 


as asserted. 
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Remark 1. In the Neyman-Pearson lemma we fixed Р, {d(X) = ay}, the 
probability of rejecting Ho when it is true, and minimized P,,{d(X) = ap}, 
the probability of accepting Но when it is false. Here we no longer have 
a fixed level a for Р, {d(X) = а}. Instead we allow it to assume any value 
as long as R(z, 4), defined in (12), is minimum. 


Remark 2. It is easy to generalize Theorem 1 to the case of multiple deci- 
sions. Let X be an ry with pdf (pmf) fo where 6 can take any of the k 
values 01, 02, ---, 0,. The problem is to observe х and decide which of the 
0,'s is the correct value of 0. Let us write H;:0 = б, i — 1, 2, «+, k, and 
assume that z(0) = л, i = 1, 2,---, ky X: a; = 1, is the prior probability 
distribution on Ө = (6,, б, --., 0,). Let 


1 if d chooses 0, j # i. 
L(0;, d) = 5 i 
Фф d) t if d chooses 6;. 
The problem is to find a rule d that minimizes R(z, d). We leave the reader 
to show that a Bayes solution is to accept H;: 0 = 0; (i = 1, 2, «+, k) if 
(5) TX) = пх)  foraljzij-12,-,k, 


where any point lying in more than one such region is assigned to any one 
of them, 


Example 2. Let Xj, Xp, --., X, be iid (д, 1) rv's. To test Hy: и = uo against 
Hy: д = uj (> pp), let us take a = b in the loss function (1). Then Theorem 1 
says that the Bayes rule is one that rejects Hy if ! 


(х) To 
Ж Ara To? 


that is, 


не ASD ИГ 
а д pun Т, ш) |> 95 


E XR 
exp {a oi wÈ x; + ie а > т %— 


This happens if and only if 


l4... log [roll =m) wot 

"5 i х2 py + 2 
Where the logarithm is to the base e. It follows that, if ло = 1, the rejection 
region consists of 


х> АА. 
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Example 3. This example illustrates the result described in Remark 2. 
Let X, Xp, --:, X, be a sample from (p, 1), and suppose that 4 can take 
any one of the three values шщ, ио, Or дз. Let ш < t < s Assume, for 
simplicity, that zı = л, = zs. Then we accept H;: и = дї = 1, 2, 3, if 
2 
Кеа B) f- a (хь — uj) } 
Ti ex[- da ООУ AT zom,exp 2. INR SUNT 
for each j # i, j = 1, 2, 3. 
It follows that we accept Н; if 


(s - де + HSH > 0, J 291721 3). AD, 
that is, 
ябып) > ST EME д) розу). 
Thus the acceptance region-of Н, is given by 


к< А +2 ze А + Аз 
ТЕРТ and ESI 


Also, the acceptance region of H; is given by 


х> Mth and gc Het bs 
2 2 
and that of. H; by 


ду + us : Ha + Из 
х> Deng ere and * > ср Sera 
In particular, if 4; = 0, 4; = 2, из = 4, we accept Hy if x < 1, H; if 
1 s 4 < 3, and Н; if > 3. In this case, boundary points 1 and 3 have zero 
probability, and it does not matter where we include them. 


PROBLEMS 10.6 


1. In ple 1 let л = 15, ро = 4.7, and д, = 5.2, and choosea = b > 0. Find 
the mi test, and compute its power at и = 4.7 and и = 5.2. 

2. oe of five observations is taken on a b(1, 0)-rv-to test Hy: 0 = 1 against 
Ay: 6 = 4. { N 

(a), Find the most powerful test of size a = .05. 

(D IfLG, 4) = LG, 2) = 0, LG, 1) = 1, and LG, 4) = 2, find the minimax rule. 
(c) If the prior probabilities of б = 1 and б = 3 aré z, = } and 7; = +, respec- 

tively, find the Bayes rule. 3 


BAYES AND MINIMAX PROCEDURES 465 


3. A sample of size n is to be used from the pdf 

f(x) = б et, х> 0, 
to test Hy: 0 = 1 against H,: 0 = 2. If the a priori distribution on 0 is ло = $, 
пу = i, and a = b, find the Bayes solution. Find the power of the test at 0 = 1 
and б = 2. 


4. Given two normal densities with variances 1 and with means —1 and 1, 

respectively, find the Bayes solution based on a single observation when a = b» 
and (a) ло = п, = +, and (b) ло = 4, л, =}. 

5. Given three normal densities with variances 1 and with means —1,0,1, 

respectively, find the Bayes solution to the multiple decision problem based on a 
single observation when л; = 3, л, = $, л; = {. 


6. For the multiple decision problem described in Remark 2 show that a Bayes 
solution is to accept H;: 0 = 0,(i = 1, 2, ---k) if (15) holds. 


CHAPTER II 


Confidence Estimation 


14.1 INTRODUCTION 


In many problems of statistical inference the experimenter is interested in 
constructing a family of sets that contain the true (unknown) parameter 
‚ value with a specified (high) probability. If X, for example, represents the | 
‘length of lifeof a piece of equipment, the experimenter is interested in a lower 
bound 0 for the mean 0 of X. Since @'= Q(X) will be a function of the 
observations, one cannot ensure with probability 1 that 0(Х) < 0. All that one 
can do is to choose a number 1 — a that is close to 1 so that РХ) < 6} 
> 1 — a for all 0. Problems of this type are called problems of confidence 
estimation. In this chapter we restrict ourselves largely to the case where Ө © Ф 
and consider the problem of setting confidence limits for the parameter б. 
In Section 2 we introduce the basic ideas and study some simple methods 
of finding-confidence intervals. Section 3 deals with shortest-length confidence 
intervals, while Section 4 explores the relationship between tests of hypo- 
theses and confidence intervals. In Section 5 we study the theory of un- 
biased confidence intervals, and in Section 6 the method of construction of 
Bayfs confidence intervals is described. 


11.2. SOME FUNDAMENTAL NOTIONS OF CONFIDENCE ESTIMATION 


So far we have considered a random variable or some function of it as the 
basic observable quantity. Let X be an rv, and a, b, be two given positive 
real numbers, Then 


P(a < X < b) = P(a < Xand X < b} 
bX 
= PX > b and x < b) 
466 
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= Pix «b« 2х). 


and if we know the distribution of Х and а, b, we can determine ће prob- 
ability P{a < X < b). Consider the interval I(X) = (X, bX/a). This is an in- 
terval with end points that are functions of the rv X, and hence it takes the 
value (x, bx/a) when X takes the value x. In other words, (Х) assumes the 
value I(x) whenever X assumes the value x. Thus 1(X) is a random quantity 
and is an example of a random interval. Note that (X) includes the value b 
with a certain fixed probability. For example, ifb = 1,а = J, and X is 
U(0, 1), the interval (X, 2X) includes point 1 with probability 5. 


Definition 1. Let Pp, 0€ 0 © A, be the set of probability distributions of 
an rv X. A family of subsets S(x) of 6, where S(x) depends on the obser- 
vation x but not on 6, is called a family of random sets. If, in particular, 
6 € @ and S(x) is an interval (6(x), б(х)), where ((x) and 6(x) are functions 
of x alone (and not 0), we call S(X) a random interval with ((X) and 6(X) as 
lower and upper bounds, respectively. ((X) may be — oo, and 6(X) may be + oo. 


In a wide variety of inference problems one is not interested in estimating 
the parameter or testing some hypothesis concerning it. Rather, one wishes 
to establish a lower or an upper bound, or both, for the real-valued para- 
meter. For example, if X is the time to failure of a piece of equipment, one 
is interested in a lower bound for the mean of X. If the rv X measures the 
toxicity of a drug, the concern is to find an upper bound for the mean. 
Similarly, if the rv X measures the nicotine content of a certain brand of 
cigarettes, one is interested in determining an upper and-a lower bound for 
the average nicotine content of these cigarettes. 

In this chapter we are interested in the problem of confidence estimation, 
namely, that of finding a family of random sets S(x) for à parameter 0 such 
that, for a given a, 0 < æ « 1 (usually small), 


(1) P,{S(X)20} 21-а, forall0e6. 
We restrict our attention mainly to the case where 0c 0 © 2. 


Definition 2. Let0 c 0 € 2 and 0 <a < 1. A function ((X) satisfying 
Q) Р(Х) < 0}>1—@ forall ' 


is called a lower confidence bound for 6 at confidence level 1 — æ. The 
quantity 


(3) inf PX) x 0} 


is called the confidence coefficient. 
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Definition 3. A function 0 that minimizes 
(4) PAX) < 0'} for all 0’ < 0 


subject to (2) is known as a uniformly most accurate (UMA) lower con- 
fidence bound for 0 at confidence level 1 — a. 


Similar definitions are given for an upper confidence bound for 0 and a 
UMA upper confidence bound, 


Definition 4. A family of subsets S(x) of Ө с &, is said to constitute a 
family of confidence sets at confidence level 1 — a if 

(5) PS(X)5021-—a га! є, 

that is, the random set S(X) covers the true parameter value @ with prob- 


ability > 1 — a. A lower confidence bound corresponds to the special case 
where k = 1 and 


(6) S(x) = (0: (x) < 0 < оо); 
and an upper confidence bound, to the case where 
(7) Sx) = (0: (x) 2 б> ~o}: 
If S(x) is of the form 
(8) S(x) = (Kx), б(х)) 
we will call it a confidence interval at confidence level 1 — a, provided that 
(9) РАХ) < 0 < HX)}}>1—a forall 0, 
and the quantity. 
(10) inf Pr{ WX) < 0 < H(X)} 


will be teferred to as the confidence coefficient associated with the random 
interval. 


Remark 1. It is not quite correct to read (5) or (9) as follows: The proba- 
bility that 0 lies in the set S(X) is > 1 — æ. 0 is fixed; it is S(X) that is 
random here, One can give the following interpretation to (5) or (9). 
Choose and fix a, 0 < æ < 1 (usually small, say .01 or :05). Consider a 
Sequence of independent experiments in which each experiment consists 
of taking a sample of size п from a population distribution with unknown 
parameter 0, where л and 0 may vary from experiment to experiment. For 
each sample point x, compute S(x). Then the parameter 0 of the corresponding 
. population may or may not be covered by S(x). Although the set S(x) will 
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vary from sample to sample, the probability that the statement “S(x) includes 
0" will be true is roughly (at least) 1 — а. Consequently the probability 
of making a false statement is roughly (at most) a. In the long run we 
therefore expect to be correct in at least (1 — a) 100 percent of all cases. 
Note that a given confidence set either includes the true parameter point 
0 or does not. One would never know this unless 0 was known. But the 
point is that one would be successful in trapping this true value 6 in sets 
of this type at least (1 — а) 100 percent of the time. 


Definition 5. A family of 1 — a level confidence sets (S(x)) is said to be 
а UMA family of confidence sets at level 1 — a if 
Py{S(X) contains 0) < Р, {5'(Х) contains 0} 
for all 0, 0' and any 1 — a level family of confidence sets S'(X). 
Example 1. Let X; X», X, be iid rv's, X; ~ (pu, a°). Consider the 


interval (X = сү, Y + с): In order for this to be a 1 — æ level confidence 
interval, we must have 


P{X-a<p<X¥+a}>1-<a4, 
which is the same as 

P{u-a<X¥<pt+ayei—a. 
Thus 


dient) mies od он: Ler 3) Bhs 
ң Маа ун уа о 
Since y/n (X — la ~ 0, 1), we can choose c, and с; to satisfy 
Usu ае Bord 
P|- Sn < 7 Ул < 2/1) 1-а, 
provided that о is known. There are infinitely many such pairs of values 


(с, c2). In particular, an intuitively reasonable choice is c; = — с; = с, say. 
Tn that case 


ул nA 
and the confidence interval is (X — (0/41): 2,» X + (a/4/m)Zq/2).- The 
length of this interval is (20/4/7) z,/?. Given a and a, we can der n 
to get a confidence interval of a fixed length. 

If с is not known, we have from. 


P(-CG«X-u«e)zl-a 
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that 

X-u 
S/./n 
and once again we can choose pairs of values (с, c?) using a t-distribu- 
tion with n — 1 d.f; such that i 


pf- aE < Xy a.d T] na. 


Ісав 
рул < 


<} 21-а 


In particular, if we take c; = —с; = с, Say, then 

e xn. = ty-1,0/2 
and (X — (5/4/п)г, а X + (SI mt, 1,2), is a 1 — a level confidence 
interval for д. The length of this interval is (25/ 4/п)1,-1,а 2» Which is 
no longer constant. Therefore же! cannot choose 7 to get a fixed-width 
confidence interval ‘of level 1 — a. Indeed, the length of this interval 
can be quite large if ø is large. Its expected length is 


2 2 2 Г(п[2; 
Wm ben E,S = Jaen yer ge 1) 0, 


which can be made as small as we please by choosing л large enough. 


Example 2. In Example 1, suppose that we wish to find a confidence 
interval for c^ instead when д is unknown. Consider the interval (су$°, e; S^), 
€j с; > 0. We have 


Р{с152 < 0? «6s > 1 — о, 
so that 


2 
Plo, S -1 um 
(e Tx < сү } >1 a. 
Since (n — 1) 502 is у (п — 1), we can choose pairs of values (cy, сг) from 


the tables of the chi-square distribution. In particular, we can choose cj, c; 
so that 


rado $e) 


Then 


n-l _ 2 п-1 _ 2 
NEN Xn—1, а? and re oe 


Thus 


Я 


1 
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{© -ps (n — by.) 
e es a spat үр 
Xn—1.a/2 X»n-11—2/2 Я 
is a 1 — а level confidence interval for g? whenever и is unknown. If p is 
known, then ; 


BOT уд 

Lo (n). 

Thus we can base the confidence interval on (X; — и): Proceeding ~ 
similarly, we get a 1 — æ level confidence interval as 3 


Ré Posen 
Жа а-ал 


Next suppose that both д and g? are unknown and that we want a 
confidence set for (u, o^). We have from Boole's inequality a 


Е. 5 S 
Pix - vatem «u«X T Jn Їп—1ау/%, à 
ү Is? 22h @= Ns ) 
Жл-1ау2 Ж 11-952 


Spe ax + ma ty-1ay2 S wor K— Jaran = ob 


1А, 2 PS 2 
- AG 1) 5 < оов" DES >20). 
Хһ-11-а5/2 Хп-10)2 
= 11-а а 


so that: ће Cartesian product, 
ШҮ ers Ses Su &- 09. G6 D$) 
S(X)- (z Vn 19-102 X+ Vn зз) х ( Ru: , Xa 


is a 1 — ay — aj level confidence set for (д, a’). 


The following result provides a general method of finding confidence in- 
tervals and covers most cases in practice. 


Theorem 1. Let X Хз" Xn be a random sample from a df Fy, 0€60, 
where @ is an interval of 2. Let T(X, 0) = T(X,, X», 5 Xm 0) be a real-valued 
function defined on Я, x Ө, such that, for each 0, T(X, 0) is a statistic, 
and as a function of б, T is strictly increasing or decreasing at every 
x = (Xis Xo s Xn) € Awe Let A = 2 be the range of T, and for every Ac A 
and xe@, let the equation 4 = T(x, 0) be solvable. If the distribution of 
T(X, 0) is independent of 0, one can construct a confidence interval for 0 
at any level. m 
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Proof. Let 0 < а < 1. Then we can choose a pair of numbers A,(a) and 
Aa) in A not necessarily unique, such that 

(11) Р»{Ау(а) < T(X, 0) < à(a)) > 1-а forall 6. 


Since the distribution of T is independent of 0, it is clear that Aj and. 4; 
are independent of 0. Since, moreover, T is monotone in 0, we can solve 


the equations 


(12) T(x, 0) = Аба) and · 170,0) = Аа) 
for every x uniquely for 0. We have 
(13) PLKX)«0-«0X)21-a -for all 9, 


where ((X) < 0(X) аге гу”. This completes the proof. 


Remark 2. The condition that 2 = T(x, 0). be solvable will be satisfied if, 
for example, T is continuous and strictly increasing or decreasing as a 
function of 0 in Ө. 


Remark 3. It is usually possible to find a function T that is monotone and 
has a distribution independent of 0. For example, if Fy is continuous and 
monotone, in 0, we can take 


T(X, 0) = n F(X). 
Since Е, is continuous, РХ) are iid with U[0, 1] distribution, Then 
-log T(X, 0) = — T log Р(Х, 
where — log: F,(X;) are iid гу, each with common G(1, 1) distribution. It 
follows that —log Т(Х, 0) ~ G(n, 1), and we can find Ài» Az such that 


'—log à 


Po{—log à < -log T(X, 0) < —log A) = TG | хе ау 


log 22 
=1—а. 


Thus 
bond РКА < Т(Х, 0) < A} =l ~a. 
This last statement is equivalent to 

Me UPX) < 0 < (X) -1— a. 


Note. that in the continuous case we can find a confidence interval with 
equality on the right side of (11). In the discrete case, however, this is 
usually not possible, ji ^ 
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Remark 4. Relation (11) is valid even when the assumption of mono- . 
tonicity of T in the theorem is dropped. In that case inversion of the 
inequalities may yield a set of intervals (random set) S(X) in @ instead of 
a confidence interval. 


Remark 5. The argument used in Theorem 1 can be extended to cover the 
multiparameter case, and the method will determine a confidence set for all 
the parameters of a distribution. 


Example 3. Let Xj X» °°, X, ~ W(us o?) where g is unknown and we 
seek a 1 — a level confidence interval for д. Let us choose 
1%) = T LE ug, 


where X, S? are the usual sample statistics. The conditions of Theorem 1 
are clearly fulfilled. The гу T(X, д) has Student's t-distribution with n — 1 
d.f., which is independent of џ, and T(X, и), as a function of uis monotone 
We can clearly choose A,(a), A,(a) (not necessarily uniquely) so that 


P{A\(a) < T(X, н) < Аа)) = 1-а for all p. 
Solving 
ма) = Ё ут, 
we get 
S 9? $ 
uX)-X- va ^s AX) =X- Tn А) 


and a 1 — а level confidence interval is 
M 
(£- m E- 7 AQ) 
In practice, one chooses А00) = —Aj(a) = ty-1,0/2- 


Remark 6. If a sufficiently large sample is available and the conditions 
of the central limit theorem hold, it is frequently possible to’ construct 
approximate confidence intervals of any desired level. We illustrate the 
method with the following example. 


Example 4. Let Х|, Xp, ---, X, be iid туз with finite variance. Also, let 
ЕХ, = u and EX? = д? + y. From the CLT it follows that 


-uL 
о[/ п Z 
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where Z ~ N(0, 1). Suppose that we want a 1 — а level confidence interval 
for p, when g is not known. We know from Example 7.3.5 that for large n 
the quantity [yn (X — )/S] is approximately normally distributed with 


mean 0 and variance 1. Hence, for large n. we can find constants cj, c; such | 


that 


Ња < KTE n «e t-a 


In particular, we can choose —c; = c; = z,;; to give 
ESO ы ne ы ыш. ) 
(s Мп 2° * W Vn n 
as an approximate 1 — а level confidence inverval for д. 

For large samples there is yet another possibility: using the maximum 
likelihood estimate for the parameter 0, provided that it exists, If Ó is the 
MLE of 6 and the conditions of Theorem 8.7.4 or 8.7.5-are satisfied 
(caution: see Remark 8.7.6), then 

x56 —0 L y, Ti asin эф, 
where 


a [of ню. 


In such cases, we can invert the statement 


эы E Мт < zan) >l-a 
c 
to give an approximate 1 — a level confidence interval for 0. 

Yet another possible procedure has universal applicability and hence can 
be used for large or small samples. Unfortunately, however, this procedure 
usually yields confidence intervals that are much too large. The method em- 
ploys the well-known Chebychev inequality (see Section 3.4): 


P(X — EX| < ey var Œ} > 1 — = 


If 6 is an estimate of 0 (not necessarily unbiased) with finite variance (0), 
then by Chebychey’s inequality. 


200 - 0| < VEG = 6e) > 1— à 
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It follows that 

(à — eV EG — 0 Ô + «V EÔ — бу) 
is a 1 — (1/e2) level confidence interval for б. Under some mild consistency 
conditions one can replace the normalizing constant „Е — 0], which 


will be some function А0) of 0, by А0). 
Note that the estimate 0 need not have a limiting normal law. 


Example 5. Let Xj X» s X, be iid b (1, p) rv's, and it is required to find ` 
a confidence interval for p. We know that EX = р, and 


var (Ж) = Yat GO. = RSP). 
n . n 
It follows that i 
p- Oa а 
рғ ау = |>! E 
Since p(1 — p) < 4, we have 
р 1 1 nu 
AR 5р7 ж 
One сап now choose e and n or, if л is kept constant ata given number, ё 


to.get the desired level. 
Actually the confidence interval obtained above can be improved some- 


‘what. We note that 
D = р) 1 
Вх = | уре Р > 1- d 


so that 
Zo — p). 1 
ніх =p? < ғ Ыт n) >1- T 
Now 
2 
27. 
|X - pl E = р) 


if and only if 


This last inequality holds if and only if p lies between the two roots of the 
quadratic equation 
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2 2 
NEES Vit. 6 y.- 
(1 uu - (0+). 0. 


The two roots are 


Baayen а БАД 
Be Fa ol XD up (ên) — AG" |n) XU = X) + (e mn’) 
1 + (ел) 2{1 + (e*/n)] 


p, = 24 + Cin) — X (п) — apr + Git 


' and 


p - 2o CI + + ny Ad cre 
21 + (є'/л)] 
=F (т + Men) XA — ) + (їн?) 
; 1 + (€/n) A1 + (п) 
It follows that 
Pip: < p < py} > 1 - х 
Note that when n is large 


pre X= «1L 3) Po =X + LAM - Lg 


as one should expect in view of the fact that“ Y 2 p with probability 1 and 
VIX - X )/n] estimates 4/[p(1 — p)/n}. Alternatively, we could have used 
the CLT (or large-sample property of the MLE) to arrive at the same result 
but with є replaced by 24/2. 


Example 6. Let X, X» 7, Х bea sample from U(0, 0). We seek а con- 
fidence interval for the parameter 0. The estimate 


Ô = max (X, X,) = М, 
is the MLE of 0, which is also sufficient for 0; The pdf of M, is given by 


5 


nal 
ny 
Ау CURE 0 6, 

ло) = | e ses 

0, otherwise. 
The rv Т, = M,/6 has pdf 

nri Qe pep 

Kn lot otherwise, 


Which is independent of 0. Applying Theorem 1, we find numbers A,(a), 
Аза) such that ' 


Р{А < T, < À)-21-a for all 0, 
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that is 


AN f di21-a, 


so that 
дА = 1-а. 

This equation has infinitely many solutions. If we choose 4; = 1, then 
А = (a) and a 1 — a level confidence interval is given by (M,, a^!" Му). 
In Example 11.3.4 we will show that this confidence interval has the smallest 
length among all confidence intervals for 0 based on М,. 

Let us now apply the method of Chebychev's inequality to the same 
problem. We have 


ide 7 
BM, si! 
and 
Са c) Pat2i А, 
ЖА уг Cire sty ЕЗ” 
Thus 


pf (Mn — 0| /@ + 1) (n 2) ТӨШ: 
р{ | à | ч] 2 < e} >1 e 
Since M, 2, 0, we replace 0 by M,, and, for moderately large л, 
М, — 0) mt NFD paseo 
Р{! М, Н; 2 < a) > ! 3i 


It follows that 


42 T loeo 
(м. М. /n-1n-2)' 5” tn’ М(п + I) + 5) 
is a 1 — (1/25) confidence interval for 0. Choosing 1 — (1/22) = 1 — a, ог 
€ = I| A/a, and noting that 1/4/[(m + 1) (n + 2)] = 1/п for large n, and the 
fact that М, < 0, we can use the approximate confidence interval r 


(nie 42) 


M, 


for 0. 


In the examples given above we see that, for a given confidence level 1 — a, 
à wide choice of confidence intervals is available. Clearly, the larger the 
interval, the better is the chance of trapping a true parameter value. Thus 
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the interval (— со, +00), which ignores the data completely, will include ? 
the real-valued parameter 0 with confidence level 1. However, the larger the 

confidence interval, the less meaningful it is. Therefore, for a given confidence 
level 1 — a, it is desirable to choose the shortest possible confidence interval.” 
Since the length ĝ- 0, in general, is a random variable, one can show that a 
confidence interval of level 1 — æ with uniformly minimum length among 
all such intervals does not exist in most cases. The alternative, to minimize 
Еб — 0), is also quite unsatisfactory. In the next section we consider the 
problem of finding shortest-length confidence interval based on some 
suitable statistic. 


PROBLEMS 11.2 


1. A sample of size 25 from a normal population with variance 81 produced a 
' mean of 81.2. Find a .95 level confidence interval for the mean y. 


2. Let Y be the mean of a random sample of size n from “N (u, 16). Find the 


smallest sample size n such that (X — 1, X + 1) is a .90 level confidence interval 
for p. 


3. Let Xy, X, ^, Xm and Y, Ys =, Y, be independent random samples from 
N (u о?) and „(дь 0°), respectively. Find a confidence interval for p, — p at 
confidence level 1 — а when (a) e is known, and (b) с is'unknown. 


4. Two independent samples, each of size 7, from normal populations with com- 
mon unknown variance 2? produced sample means 4.8 and 5.4 and sample vari- 
ances 8.38 and 7.62, respectively. Find a .95 level confidence interval for щ — Ha 
the difference between the means of samples 1 and 2. 


5. In Problem 3 suppose that the first population has variance g,? and the second \ 
population has variance g;^, where both оү, and 0; are known. Ешда1—@ 


level confidence interval for и, — д. What happens if both o,? and c;? are ùn- 
known and unequal? 


6. In Problem 5 find a confidence interval for the ratio 7;?/z,?, both when д, Ma | 


are known and when 44, р. are unknown. What happens if either и, or fi is 
a unknown but the other is known? 


7. Let Ху, X, vts X, be a sample from a,G(1, £) distribution. Find a confidence 
interval for the parameter 8 with confidence level 1 — a. ; 
8. (a) Use the large-sample properties of the MLE to construct a 1 — a level | 

confidence interval for the parameter 0 іп each of the following cases: | 


@ X, Xa ++, X, is a sample from G(1, 1/0), and (ii) Xj, X, ««, X, is 
a sample from P(0). 


(b) In part (a) use Chebychev's inequality to do the same. 
9. For a sample of size 1 from the population 
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fi) 0-3. 0« x «0, 


find a 1 — a level confidence interval for б. 


10. Let Ху, X; +++, X, be a sample from the uniform distribution on N poillts. 
Find an upper 1 — a level confidence bound for N, based on max (Xis Xo +, X4). 
11. In Example 6 find the smallest л such that the length of the 1 — а level 
confidence interval (M,, a-!/" M,) < d, provided it is known that б < a, where 
ais a known constant. 


12. Let X and Y be independent rv's with pdf’s 4e?* (x > 0) and ze-^? (у > 0), 
respectively. Find a ! — а level confidence region for (A, и) of the form 
(A, 4): AX + Y < К). 


11.3 SHORTEST-LENGTH CONFIDENCE INTERVALS 


We have already remarked that we can increase the confidence level by 
simply taking a larger-length confidence interval. Indeed, the worthless 
interval — оо < 0 оо, which simply says that 0 lies somewhere on the real 
line, has confidence level 1. In practice, one would like to set the level at a 
given fixed number 1 — a (0 < æ <1) and, if possible, construct an interval as 
short as possible among all confidence intervals with the same level. Such an 
interval is desirable since it is more informative. We have already remarked 
that shortest-length confidence intervals do not always exist. In this section 
we will investigate the possibility of constructing shortest-length confidence 
intervals based on some simple rv's. The discussion here is based on Guenther 
[40]: Theorem 11.2.1 is really the key to the following discussion. 

Let Ху, Х --., X, be a sample from a pdf f(x), and T(X;, X», "~", Х„ 0) 
= Ту be an rv with distribution independent of 0. Also, let д = Аа), Аг 
= Аҳа) be chosen so that 


(1) ` РА < Т,<%}=1-@, 
and suppose that (1) can be rewritten as " 
(2) РХ) < 0 < (X)) -1— a. Ф 


(See Theorem 11.2.1 for a set of sufficient conditions.) 

For every Ty, Ау and 2; can be chosen in many ways. We would like to 
choose А; and 4; so that 0 — 0 is minimum. Such an interval is a 1 — æ level 
shortest-length confidence interval based on Ty. It may be possible, however, 
to find another rv Tf that may yield an even shorter interval. Therefore we 
are not asserting that the procedure, if it succeeds, will lead to a 1 — a level 
confidence interval that has shortest length among all intervals of this level. 
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For Т» we use the simplest ry that is a function of a sufficient statistic and 
0. Let us first give a suitable name to Т(Х, 6). 


Definition 1. Ап rv T(X, 0), which is a function of X = (X, Xo, +++, X,) and 
0, and whose distribution is independent of 6, is called a pivot. 


For example, in sampling from a normal population, if the variance is 
known, (X — y)4/n/o is a natural choice for а pivot. If о is unknown, 
(X — p)Vn/S is a natural choice for a pivot. If one desires a contidence 
interval for the variance c^, XXX, — plo isa pivot; and, if д is unknown, 
5°/0° is a pivot. 7 


Remark 1. Ап alternative to minimizing the length of the confidence 
interval is to minimize the expected length Е,{@(Х)— 6(X)}. Unfortunate- 
ly, this also is quite unsatisfactory since, in general, there does not exist a 
member in the class of all 1 — æ level confidence intervals that minimizes 
Е,{#(Х) — ((X)) for all 0. The procedures applied in finding the shortest- 
length confidence interval based on a pivot are also applicable in finding 
an interval that minimizes the expected length. We remark here that the 
restriction to unbiased confidence intervals is natural if we wish to minimize 
ЕХ) — 6(X)}. See Sections 11.4 and 11.5 for definitions and further 
details. 


Example 1... Let. Xi, X, ---, X, be a sample from (а, 0”), where g? is 
known. Consider the pivot 


ye 
ТАХ) = UA 
Then 
1- a= Pac кт) 
у = Rb ap ахаа). 


The length of this confidence interval is (а/п) (b — a). We wish to 
minimize L = (¢/4/n) (b — a) such that 


Ф) — Фа) = ag | ева - fe dx - 1 a. 


“Неге o and Ø, respectively, are the pdf and df of an Jy (0, 1) rv. Thus 


£4) 
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and 
qb) 4. — ofa) = 0, 
giving 
dL | c [ga _ 
da Vx wb - H: 


The minimum occurs when g(a) = 9(6), that is, when а= b or a= — b. 
Since a — b does not satisfy ; 


fima - 1-2 


we choose а = — b. The shortest confidence interval based on T, is 
therefore the equal tails interval, 


" o Р o ӯ 
(t fice TR’ Х + Za/2 Fral- fet fom (X Zanz уз), 


The length of this interval is 22,2 (c/4/n). In this case we can plan our 

experiment to give a prescribed confidence level and a prescribed length 

for the interval. To have level 1\—.@ and length < 2d, we choose the 

smallest n such that Я 
2 


[4 Д g- 
dz Zart TR or n> Zen Qe 


This can also be interpreted as follows. If we estimate и by X, taking a` 
sample of size n > 215 (214°), we are 1 — @ percent confident that the 
error in our estimate is at most d. 


Example 2. In Example 1 suppose that о is unknown. In that case we use 
T = —. n 


as a pivot. Т, has Student's t-distribution with n — 1 d.f. Thus 


ї-а= pb < р unl) = Е a1 e) 


We wish to minimize 
S 
тоу 
Subject to " f 
INEO, =1-a, 


=== 
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where /,_1(0) is the pdf of T,. We have 

Ж -(@- 1) and f,-(0) 42 — f, aa) = 0, 


giving 
dL [9 m aa 
da | fa-1(b) yn 
It follows that the minimum occurs at a = — b (the other solution, a = b, 


is not admissible). The shortest-length confidence interval based on T, is 
the equal tails interval, 


(x т 19-1902 F2 Х tra zu) 


The length of this interval is 2t,,-),./2 (S/ /n), which, being random, may 
be arbitrarily large. Note that the same confidence interval minimizes the 
expected length of the interval, namely, EL = (b — a)c,(a/4/n), where c, 
is a constant determined from ES = c, о and the minimum expected length 
is 21, 4/572 Cn(@/4/n). In this case Stein suggested a two-stage procedure that 
is discussed in Section 14.4. 

Example 3. Let X; X; +, X, be iid W(u, а?) rv's. Suppose that pis 
known and we want a confidence interval for o^. The obvious choice: for a 
pivot Т, is given by 


ÈX- u? 
TAX) = 31, 


ГЕ 
which has a chi-square distribution with n d.f. Now 


Еа нын 


* so that 


Eo - и) X, — uy 
efi pe age Bb ‹ sd, m 


We wish to minimize 
| | L-(L-i)ba- 
subject to 


fua - Itu : 
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where f, is the pdf of a chi-square rv with n d.f. We have 
dL 1 1 dbz 
(г В) 0-0 


da p da, 
and 
db _ Ја) 
da fab)? 
so that 


which vanishes if 


Numerical results giving values of а and b to four significant places of 
decimals are available; see Tate and Klett [129]. In practice, the equal tails 
interval, 6 


DERE itas 


= , 


X. a/2 ў Xn, 1—a/2 

is used. j 

If u is unknown, we use 
Ex- xy s 
TAX) = 31— 37 ^is doge н 
as a pivot. T,2 has а xn — 1) distribution. Proceeding as above, we can 
show that the shortest-length confidence interval based on. T,2 is (n — 1) 
(S?/b), (n — 1) (S?/a)); here a and b are a solution of і 


Pla «yn — 1) 25 = 1-а 


and 

df, a) = bfai), : ) 
where f,-; is the pdf of a y(n — 1) rv. Numerical solutions due to Tate 
and Klett [129] may be used, but, in practice, the simpler equal tails 


confidence interval, 
(eos (n- Ds? ) 
X», a2 à Xii ica 


3 


t 
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is employed. 


Example 4., Let X, X» =, Y, be a sample from U(0, @). Then Mep 


max (X,, Xp, --+, X,) is sufficient for б with density 
5 -1 
fan, о<у<б. 
The rv Т, = M,/0 has pdf 


UM) ant, ocre 
Using T, as pivot, we see that the confidence interval is (M,/b, M,/a) with 
length L = M,(1/a — 1 /b). We minimize L subject to 
fie d= a = l1 — a. 
Now 
› d-al™<b<1 
апа: 
db. D3das Sl үш aui pr 
co Se) M gei) <8 


so that the minimum occurs at b = 1. The shortest interval is therefore 
(M, M,/a'""). Note that 


NOLS Ove sn S] 
EL = (2. p) £M, = "er (4 i) 
which is minimized subject to 
Ш lic a, 


where b = 1 and'a = ай", Thé expected length Of the interval that minim- 


izes EL is [(1/a!") — 1] [ngJ(n + D], which is also the expected length of 
the shortest confidence interval based on M,. 


Note that the length of the interval (M, a "М,) goes to 0 as n becomes 


A For some results on asymptotically shortest-length confidence intervals, 
‚ We refer the reader to Wilks [141], pages 374-376. І 


PROBLEMS 113 
. Let X, X, +, X, bea sample from ^. © 


pU а» 6, 
= 6 ~ otherwise. 
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Find the shortest-length confidence’ intervel for 0 at level 1 — a, based on a 
sufficient statistic for б. 

2. Let X,, Xz, ---, X, be a sample from G(1, 0). Find the HP bn confi- 
dence interval for 0 at level 1 — а, based оп a sufficient statistic for 0. 

3. Let Xy, Xz г, X, bé a sample from B(0, 1). Find the shortest-length confi- 
dence interval for б at level 1 — а, based on a sufficient statistic for б. 

4. In Problem 11.2.9 how will you find the shortest-length confidence interval 
for 0 at level 1 — a; based on the statistic X/0? 

5. Let |7(X, 0) be a pivot of the form T(X, 0) = Т,(Х) — 0. Show how one can 
construct a confidence interval for 0 with fixed width d and maximum possible 
confidence coefficient. In particular, construct a confidence interval that has fixed 
width d and maximum possible confidence coefficient for ћелпеап: оба: normal i 
population with variance 1. Find the smallest size n for which this confidence 
interval has a confidence coefficient 2 1 — a. Repeat the above in sampling. from 
an exponential pdf 


Рх) = е*-* for x > pu and f(x)=0 forx < p (Desu [24]) 


11.4 RELATION BETWEEN CONFIDENCE ESTIMATION AND HYPO- 
THESES TESTING 


In this section we investigate the relationship between confidence estimation 
and hypotheses testing. More precisely, given a level a test of Ho: 0. = бо, 
Say, is it possible to construct a 1 — а level confidence interval for б, and 
conversely? 


Example 1. Let X, Xj --, X, be а sample from .# (д, 02): In Example 
11.2.1 we showed that ` s 


(2- он X+ т алт) 


is a 1 — а level confidence interval for д. If we define a test 9 that rejects’ 
а value of и = jo if and only if ду, lies outside this interval, that is, if and 
only if 

A n|X — pol 


ZZ.» 
[Л ie 


then 


nr Ela) a 


90 


and the test o is a size о test of 4 = до against the alternatives LF ue 
Conversely, a family of a level tests for the hypothesis и = uo generates 


x 


` Theorem 1. Let А(00), б Ө, denote the region of acceptance of an a level 


, Tejection (acceptance) region of the test is the indicator function of a (Borel- 


486 bu .CONFIDENCE ESTIMATION 


a family of confidence intervals for и by simply taking, as the confidence 
interval for 4, the set of those д for which one cannot reject 0 — д = д. 
Similarly, we can generate a family of æ level tests from a 1 — a level 
lower (or upper) confidence bound. Suppose that we start with the 1 — a 
level lower confidence bound X — 2,(оо/ /n) Then, by defining a test o(X) 
that rejects р < po if and only if ду < X — z,(so[ / n), we get an a level 
test for a hypothesis of the form 4 < шу. А 


Example lis.a special case of the duality principle proved in» Theorem 1 
below; In the following we restrict our attention to the case in which the 


measurable) set, that is, we consider only nonrandomized tests (and con- 
fidence intervals). For notational convenience we write Ho(0,) for the hypo- 
thesis Ho: 0 = бу and H,(0o) for the alternative hypothesis, which may be 
one ог two-sided, 


test of Ho(0o). For each observation x = (x), x», «++, x,) let S(x) denote the set 
(1) S(x) = (0:xe 4(0), 0€ Ө}. 


Then S(x) is a family of confidence sets for 0 at confidence level 1 — a 
If, moreover, А(0;) is UMP for the problem (a, Ho(0o), Hy(09)). then S(X) 
minimizes 


Q). P,(S(X) э 0') for all бє H,(6’) 
among all 1 — œ level families of confidence sets. 


Proof. We have 
(3) S(x)230 if and only x e 4(0); 
so that 

P,{S(X) 30} = PX € A(60)) > 1— a, 
as asserted. 


If S*(X) is any other family of 1 — a level confidence sets, let 
A*(0) = (x: S*(x) 50). Then 


P(Xe4*(0) = Pof{S*(X)30} > 1 о; 
and, since 4(05) is UMP for (a, Ho(0o), Hi(0»)), it follows that 


P(XeA4*(0)) > Р.Х € А(00)) Ғогапу бє Н.(00). 
Hence : 


Pj(S*(X) 3 бу} = Py(Xe A(6)) = P,{S(X) эб) 


. 
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n 


for all 0 є H,(0o). This completes the proof. 


Remark 1. In view of Definition 11.2.5, the family of confidence sets as- 
sociated with a UMP acceptance region is a UMA family at level 1 — a. 


Example2. Let Y be an rv of the continuous type with a one-parameter 
exponential pdf, given by 
Ух) = exp (Q(0) T(x) + S'(x) + D(0)), 


where Q(0) is a nondecreasing function of 0. Let Ho: 0 = бапа Н,:0 < 00. 
Then the acceptance region of a UMP size a test of Hp is of the form 


A(Oo) = (x: T(x) > с(бу)}. 1 
Since, for à > 0’, 
Py{T(X) < «(0)) = а = Р,{Т(Х) < «(0)) < Po{T(X) < 60); 


c(0) may be chosen to be nondecreasing. (The last inequality follows because 
the power of the UMP test is at least œ, the size.) We have 


S(x) = (0: хє А(@)}, 
so that S(x) is of the form (— co, ¢'(Z(x))), or (— о, c^ (T(9))) where c^! 
is defined by 

сҶ1(х)) = sup (0:c(0) < Т(х)). 
In particular, if 
ёз, х> 0, 
0, otherwise, 

then T(x) = x; and, for testing Ho: 0 = 0) against Hj: 0 < б the UMP 
acceptance region is of the form ; 

Albo) = (x: x > «(09 
where 


5 0<а<1. 


0) = bo log 1 


The UMA family of + — a level confidence sets is of the form 
S(x) = {0:хє AO} 


7r ssa) 
-(- 9 атй) 
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Since UMP tests generally do not exist, we restrict consideration to smaller 
subclasses of tests and look for UMP tests in these classes (see Chapter 9). 
We will show that UMP unbiased tests lead to UMA unbiased confidence 
sets. This analogy carries over to UMP invariant tests also, but we will not 

/ show, that here. ( 


Definition 1. A family {S(x)} of confidence sets for a parameter @ is said to 
be unbiased at confidence level 1 — a if : 


ш. P,{S(X) contains 0) > 1— а 
and 
(5) Py{S(X) contains 0} « 1 — са for all 0, 0' c 0. 


If S(X) is an interval satisfying (4) and (5), we call ita 1 — а level unbiased 
confidence interval. If a family of unbiased confidence sets at level 1 — a 
is UMA in the class of all 1 — а level unbiased confidence sets, we call it 

га UMA unbiased (UMAU) family of confidence sets at level 1 — a. In 
other words if S*(x) satisfies (4) and (5) and minimizes 


P,{S(X) contains 0)  for6,0'e8 


among all unbiased families of confidence sets S(X) at level 1 — a, then 
S*(X) is à UMAU family of confidence sets at level 1 — a. 


Remark 2. Definition 1 says that a family S(X) of confidence sets for a 
parameter 0 is unbiased at level 1 — q if S(X) traps the true parameter 

value, with probability at least 1 — œ and S(X) traps a false parameter value 
` with a probability at most-1 — æ. In other words, S(X) traps a true para- 
meter value more often than'it does a false one. 


Theorem 2.  Lét A(0o) be the acceptance region of a UMP unbiased size а 
‚ test of (Qo): 0 = 0, against H0): 0 + б for each б. Then S(x) = 
{0: x € A(6)} is a UMA unbiased family of confidence sets at level 1 — a. 


i Proof. To see that S(x) is unbiased we note that, since A(0) is the acceptance 
region of an unbiased test, 
Py (S(X) contains 0} = Py {Xe 40) < 1 — a. 


We next show that S(X)is UMA. Let S*(x) be any other unbiased 1 — а 
level family of confidence Sets, and write A*(§) = {x: S*(x) contains 0). 
Then P,[X є 4*(0)) = Py{S*(X) contains 0} < 1 — a, and it follows that 4 
ı 4%(0) is the acceptance region of an unbiased size a test. Hence, 


t 
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P,{S*(X) «оныі 8) = P,(X e A*(0)) 
= Py(X e 4(0). 
= Py {S(X) contains 0}. 


The inequality, follows since A(@) is the acceptance region of a UMP-un- 
biased test. This completes the proof. 


Example 3. Let Xj, X», ---, X, be a sample from N (u, a°), where both ГА 
and о? are unknown. For testing Ho: и = fy against Hy: и # ug, we know, 
(Theorem 9.5.4) that the t-test Е 9 


„Уже, 
0, otherwise, 


where x = Xixj[n and 5° = (n — 1)! D(x; — x) is UMP unbiased. We 
choose с from the size requirement 


ene 85-2 > 


ote) =| 


so that c = t, ,,,;;. Thus 
Ao e [e| LEE siu) 
1 


is the acceptance region of a UMP unbiased size а test of Ho: u = до against 
Hy: u # цо. By Theorem 2, it follows that 


S(x) = (u: x e A(u)} 
= f ah Tr tria2s ese + tiara} 


is a UMA unbiased family of confidence sets at level l—a. 


Remark 3. Example 3 and the results of Section 9.5 show that the concept 
of unbiasedness is most 'suitable-where the UMP tests do not exist. This 
is the case; when the parameter set @ consists of points (0, и), say, where 
both 0 and џ are unknown, and one’ is ‘interested in obtaining confidence 
sets for the parameter @ alone. The parameter џи is usually teferred to as 
a nuisance parameter. 


Remark 4. In Section 11.3 we remarked that shortest-length confidence 
intervals, although very desirable, do not exist for most commonly used 
' distributions. Pratt [90] has shown that the restriction to the unbiased family 
ofconfidence intervals makes it possible, at least in many commonly 


í 
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encountered problems, to obtain 1 — « level confidence intervals that have 
uniformly minimum expected length among all 1 — о level unbiased confi- 
dence intervals. We will explore this topic in somewhat greater detail in the 
next section. I 


‘PROBLEMS 11.4 


1. Let X, X, =, X, be a sample from (p, 0°), where g? is known. Find a 
UMA 1—a level upper confidence bound for д. 


2. Let Xi, X» ---, X, bea sample from U(0, 0). Find a UMA family of confidence 
intervals for 0 at level 1 — о. 


3. Let Xi, Xn --., X, be a sample from (и, 0°), where g? is unknown and до 
is a known constant. Find a UMA unbiased family of confidence intervals for o? 
at level 1 — a. 


4. Let Xy, Xs; >, X, and Ү,, Ү„ ---, Y, be independent samples from normal 
distributions (4, a?) and (0, 02), respectively. Find a UMA family of un- 
biased confidence intervals for  — 0 at level 1 — a. 

5. In Problem 4, if the variances of the two normal populations are different, say 


7° and т?, respectively, find a UMA family of unbiased confidence intervals for 
the ratio 72/0? at level 1 — a. 


11.5 UNBIASED CONFIDENCE INTERVALS 


In the earlier sections we emphasized that, if the shortest length is used as a 
measure of precision of the confidence: interyal, it is not possible to find 
intervals that have uniformly shortest length among all 1 — a level confidence 
intervals, even for some of the most commonly used distributions. In view of 
the duality. principle (Theorem 11.4.1), since the UMP tests do not exist in 
general, one frequently restricts attention to the class of unbiased confidence 
intervals (Theorem 11.4.2). This is specially true if a nuisance parameter is 
present. Indeed, Pratt [90] has shown that, if the measure of precision is the 
expected length of the confidence interval, one is naturally led to a consider- 
ation of the unbiased confidence intervals, The following result is due to 
Pratt [90]. i 

Theorem 1. Let Ө be an interval on the real line, and fa be the pdf of X: 

Let S(X) be a family of 1 — a level confidence intervals of finite length, that 
S к SCX) = (A(X), 0(Х)), and suppose that (X) — &X) is (random) finite.’ 

n 


Mer — 
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а) few — 6(x)) f(x) dx = f P,{S (X) contains 0'} d6’ 
for all 6 e 6. dn 


Proof. . We have 


Thus for all 0 € 8 
вйх) - 099) = &[f 47) 
b foot, do dx 
ls у dx) di 


= f Р;{ S(X) contains 0") 40' 
- f P,(S(X) contains:0'} dd’. 


0*0 


Remark 1. Theorem 1 says that the expected length of the confidence inter- 
val is the probability that S(X) includes 0 averaged over all false values of б. 


Remark 2. If S(X) is a family of UMAU 1 — а level confidence intervals, 
the expected length of S(X) is minimal. This follows since the left-hand side 
of (1) lis the expected length, if 0 is the true value, of S(X) and 
P,(S(X) contains 0') is minimal [because S(X) is UMAU], by Theorem 11.4.2, 
with respect to all families of 1—0: level unbiased confidence intervals uni- 
formly in 0 (0# 6’). №; * } 


Example 1. Let Xj, Xz s» X, be a sample from “(4 o^), where both и 
and о? are unknown. To find a 1 — а level unbiased confidence interval for 
и we proceed as in Example 11.3.2. The statistic 


0. T 
ТАЮ = 8 
has Student's t-distribution with n — 1 d.f. and leads to the shortest-length 
confidence interval 


"sSQ)- (x m uan a X + 1,1,2 +) 


, 
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Moreover, by Theorems 9.5.4 and 11.4.2 (or by Example 11.4.3), S(X) is a 
UMAU family of confidence sets at level 1 — a. It follows by Remark 2 
that ES(X) is minimal. 


Since a reasonably complete discussion of UMP unbiased tests (see Section 
9.5) is beyond the scope of this text, the following procedure for determining 
unbiased confidence intervals is sometimes quite useful. (See Guenther [41]). 
Let X X», ++, X, be a sample from an absolutely continuous df with pdf 
Хх), and suppose that we seek an unbiased confidence interval for 0. Fol- 
lowing the discussion in Section 11.3, suppose that 


T(Xy, X», «++; X, 0 = Т(Х, 0) = Т, 

is a pivot, and suppose that'the statement 
P(A(a) < T; < Afa)} -1—a 

can be converted to 

POX) «0 < (х) = 1— a. 
In order for (0, 0) to be unbiased, we must have 
(2) PO, 0) = РИХ) «0 < HX} -1-o ifo'=0 
and 


(3) [ Р(0,0) <1- а. if0' s 0. 
If P(0, 0") depends only on a function 7 of б, 6’, we may write 
(4) =l-a  if0'-0, 
| ro " 
.G <l-a , if0 46, 


and it follows that P(r) has a maximum at 6’ = 6. 


Example 2. Let Xy Xo +, M be iid (uu, о?) rv's, and suppose that we 
desire an unbiased confidence interval for 02. Then 1 


ved ee 
TX, 0?) = Gaps Pa 
hasa x (n — 1) distribution, and we have 


{л «6-05, <А) -1- a, 


so that we 


2 i ү 
Ри 03 «e «0-03)-1-a. 
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Then 


: 5° st 
Che ae 21y95 <¢?2 —~1)5_ 
Р(а?, 0’2) Pin DA <0? <1) 2) 
= pl Т. T, 
= 0 «JT us x 
where 7 = o"7[g? and T, ~ уп — 1). Thus 
P(r) = Р{АТ < T, < Xr). 
Equations (2) and (3) become - 


P(l)=1-a 
and. 
P(r) <1l-a. 
Thus we need Aj, 4; such that 
(4) P(-1-—a 
and 
б) GAD | = daha ile) = fei) = 0, 


where f. is the pdf of Т; Equations (4) and (5) have been solved numeri- 


cally for А, A, by several authors; see, for example, Tate and Klett [129]. 
Having obtained Ду, A, from (4) and (5), we have as the unbiased 1 — а 
level confidence interval 


2 i 2 
@ (a-di @- 032. 


which, in view of Theorem 9.5.4, is also the UMAU confidence interval at 
level 1 — a. To see this, one simply has to note that, in the notation of 
Theorem 9.5.4, the power function of a UMP unbiased test of 0° = aj, 
namely, 1 ү 


Paley o <(n- 1)5 «oc a) 


has a minimum at ¢”=0%, which leads to conditions (4) and (5). (Compare 
with the conditions on су, с; in Remark 9.5.4.) By Remark 2 it follows that 
(6) minimizes the expected length with respect to all families of 1 — a level 
unbiased confidence intervals uniformly in g? (# 0'2). 

Note that in this case the shortest-length confidence interval (based on 
T,) derived in Example 11.3.3, the usual equal tails confidence interval, and 
(6) are all different. The length of the confidence interval (6), however, can 
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be considerably greater than that of the shortest interval of Example 113:3, 
For large n all three sets of intervals are approximately the same. 


PROBLEMS 11.5 


J. Let X, X» =, X, bea sample from U(0, б). Show that the unbiased confidence 
interval for 6 based on the pivot max Х,/0, coincides with the shortest-length 
confidence interval based on the same pivot. : 


2. Let X, X» +, X, be a sample from G(1,0). Find the unbiased confidence 
interval for 0, based on the pivot 2 7. X;/0. 


3. Let Xj, Х„ -= X, be a sample from pdf 
ec if x0 
49 = {0 otherwise, 


* Find the unbiased confidence interval based on the pivot 2n[min X; — 0]. 


11.6 BAYES CONFIDENCE INTERVALS 


So far we have considered the problem of confidence estimation by regarding 
the parameter 0 as a fixed unknown quantity. Another approach to the 
problem takes into account any prior knowledge that the experimenter has 
about the parameter. Let Ө be the parameter set, and let the observable rv 
X have pdf (pmf) f(x). Suppose that we consider 6 as ап rv with distribution 
(6) on Ө. Then f(x) can be considered as the conditional pdf (pmf) of X, 
given that the rv 0 takes the value 0. Note that we are using the same symbol 
for the rv 0 and the values that it assumes, We can determine the joint dis- 
tribution of X and 0, the marginal distribution of X, and also the conditional 
distribution of 0, given X = x as usual. Thus the joint distribution is given by 


(1) f(x, 0) = z(0) fix), sp. 
and the marginal distribution of X by А 


e У 2(6) f(x) if x is a pmf, 

) a(x) = fo fx) dà if z is a pdf. 
The conditional distribution of 0, given that x is observed, is given by 
(3) Koo = TORE, ga) > 0. 


Giyen WO |x), it is easy to find functions Kx), (x) such that 
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Р{ҚХ) <0 «wX) > b> a, 
where 
i T hO |x) dà, 
(4) P(KX) < 6 < X)|X = x) =", 
L |х), 


depending on whether h-is a pdf or a pmf. 


Definition 1. An interval (Kx), u(x)) that has probability at least 1 — a of 
including 0 is called a 1 — а level Bayes interval for 0. Also, Дх) and u(x) 
are called the lower and upper limits of the interval. 


One can similarly define one-sided Bayes intervals or 1 — а level lower" 
and upper Bayes limits. It clear is that one chooses the shortest-length 
Bayesian confidence interval among all 1 — о level confidence intervals. 


Example 1. Let X;, Xo, ©: X, be iid (751), и € 4, and let the a priori 


distribution of u be../(0, 1). Then from Example 8.8.6 we know that 
Кц|х) is 


Me тут) 


Thus a 1 — a level Bayesian confidence interval is 


(E ‘fees ie he nx 2/2 ) 
п+1 usni ntl KAPTA VETE 


А 1 — a level confidence interval for и (treating u as fixed) is a random 
interval with value 


Thus the Bayesian interval is somewhat shorter in length. This is to be 
expected since we assumed more in the Bayesian case. 


Example 2. Let X, X5, ~", X, be iid М1, p) туз, and let the a priori pdf 
on Ө = (0, 1) be U(0, 1). » Example 8.8.8. we saw that the a posteriori 
pdf of p, given X, + X; + -- + X, is given by 


а 1 — py ў 1, 
Wl ix = P = (cr b ї+1[!РЇ—ру/ 0<р< 


496 CONFIDENCE ESTIMATION 


Given suitable tables of incomplete beta integrals and the observed value 
of t, one can easily construct a Bayesian interval estimate of p. 


PROBLEMS 11.6 


1. Let X, X, -- X, be a sample from a Poisson distribution with unknown 
parameter 2. Assuming that А is a value assumed by a С(а,6) rv, find a Bayesian 
confidence interval for 4. 

2. Let X,, X» ---, X, be a sample from a geometric distribution with parameter 
0. Assuming that 0 has a priori pdf that is given by the density of a B(a, В) rv, 

„ find a Bayesian confidence interval for 0. 

3. Let X; X, ---; X, be a sample from (и, 1), and suppose that ега priori 
pdf for 4 is U( — 1, 1). Find a Bayesian confidence interval for i 


oe 


CHAPTER 12 


The General Linear Hypothesis 


12.1 INTRODUCTION 


The last three chapters of this book will be devoted to the study of some 
special topics in mathematical statistics. This chapter deals with the first of 
these, namely, the general linear hypothesis. In a wide variety of problems 
the experimenter is interested in making inferences about a vector parameter. 
For example, he may wish to estimate the mean of a multivariate normal or 
to test some hypotheses concerning the mean vector. The problem of esti- 
mation can be solved, for example, by resorting to the method of maximum 
likelihood estimation, discussed in Section 8.7, or the method of least squares, 
discussed in Section 4.8. In this chapter we restrict ourselves to the so-called 
linear model problems and concern ourselves mainly with problems of hypo- 
theses testing. 

In Section 2 we formally describe the general model and derive a test in 
complete generality. In the next four sections we demonstrate the power of. 
this test by solving four important testing problems. We will need a consider- 
able amount of linear algebra in Section 2, and the reader may want to 
refresh his memorv bv glancing through Section P.3. ) 


12.2 THE GENERAL LINEAR HYPOTHESIS 


A wide variety of problems of hypotheses testing can be treated under a. 
general setup. In this section we state the general problem, and derive the 
test statistic and its distribution. Consider the following examples. 


Example 1. Let Y; Xz ---, X, be independent rv's with ЕХ; = gj, i = 1, 

2, -.., k, and common variance g^. Also, n; observations are taken on X;, 

i= 1,2, -; k, and X^, n, = n. It is required to test Ho: ui = = ~: 
Г ipilod 
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= ш. The case k = 2 has already been treated in Theorem 9.5.5 and Section 
10.4. Problems of this nature arise quite naturally, for example, in agricul- 

- tural experiments where one is interested in comparing the average yield 
when k fertilizers are available. 


Example 2. Ап experimenter observes the velocity of a particle moving 
along a line. He takes observations at given times 4, 1o, -+-, f. Let By be the 
initial velocity of the particle, and 8; the acceleration; then the velocity at 
time t is given by x = 6, + Gof + €, where e is an rv that is nonobservable 
_ (like an error in measurement). In practice, the experimenter does not know 
By and 6; and has to use the random observations Xj, Xo, «+, X, made at 
times £j, t», +++, f» respectively, to obtain some information about the un- 
known parameters ĝi, 8. 

A similar example is the case where the relation between x and t is 
governed by | 

у х5 Bit Во +e, 
where t is а mathematical variable, &, 8j, 6, are unknown parameters, and 
e'is a nonobservable rv. The boxes takes observations Xj, X», «--, X, 
at predetermined values 1, to, 5, tn respectively, and is interested in testing 
- "hypottiesis that the relation i is in ues linear, that is, 6. = 0. 


seen Of the type discussed above and their much more complicated 
variants can all be treated under a general setup. To fix ideas, let us first 
make the following definition. 


Definition 1. Let X = (Xi, X», ·--, X,) be a fando vector, and A be an 
n x k matrix, k < n, of known constants a;;, i = 1, 2,—-,mj21,2,-k. 
We say that the distribution of X satisfies a linear model if 


(D | cot EX = BA, 


where В = (B, Bz» «+, B) is a vector of unknown (scalar) parameters ĝi, 
Go, +++, By It is convenient to write 


Q) X= fA te 
where &-— (ej, ёз, *-:, €,) is a vector of nonobservable rv's with Ee; = 0, 
j 21,2, ^, n. Relation (2) is known as a linear model. The general linear 
hypothesis concerns 8; namely, that B 'sátisies Hy: PH’ = 0, where H is 
a known r x k matrix with r < К. 
n 1 D A gt 

In what follows we will азе that 1; ер, ==, é, are independent, normal 
rV's with common variance o^ and’ Ee; =0,j = 1, 2; ---, n. In view of (2), 
it follows that Xj, X», ---, X, are independent normal rv's with 
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узго ий. 64 ay ае var (X) © o, 121,2, h 
We will assume that Н is а matrix of full rank r,'r < k, and A isa trix 
of full rank k < n. Some remarks are in order. 


ps l. наа X satisfies a linear model if the vector of ‘means 

= (EX, EX, :--, EX,) lies in a k-dimensional subspace generated by 
ris linearly snide peed column vectors aj, a3, ---, а; of the matrix A. 
Indeed, (1) states that EX is a linear combination of the known vectors 
aj, >, aj. The general linear hypothesis Hy: BH’ = 0 states that the para- 
meters Bj, 82, «--, B, satisfy ғ independent homogeneous linear restrictions, 
It follows that, under Но, EX lies in a (k — r)-dimensional subspace of the 
k-space generated by aj; ---, a; (see Section P.3). 


Remark 2. The assumption of normality, which is conventional, is made 
to compute the likelihood ratio test statistic of Hp and its distribution. If the 
problem is to estimate 8, no such assumption is needed. One can use the 
principle of least squares and estimate В by minimizing the sum of squares, 
(4) È = ee’ -(X-BAY)(X- BAY. 
i=l 

The minimizing value A(x) i is known as a least square estimate of В. This 
is not a difficult problem, and we will not discuss it here in any detail but 
will mention only that any solution of the so-called normal equations 


(5) А'АВ' = А'Х' 
is a least square estimate. If the rank ‘of, A 15-К!( < n), then A'A, which has 


the same rank as A, is a nonsingular matrix that can be inverted to give 


a unique least ‘Square estimate M 
j j 


(6) / Ё = (АААХ: 


If the rank of A is < k, then A’A is singular and the normal equations do 
not have a unique solution. One can show, for example, that Bi is unbiased 
for В, and if the X/s are uncorrelated with common variance o’, the vari- 
ance-covariance matrix of the B? sis given by 


(7) E(B - Ву (Ê - В) = 7A. 


Remark 3. Опе can similarly compute the so-called restricted least square 
estimate of B by the usual method of Lagrange multipliers. For example, 
under Ho: BH’ = 0 one simply minimizes (X — BA’) (X — BA") subject. to 
BH' — 0 to get the restricted least square estimate B. The important point 


í 
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is that, if e is assumed to be-a multivariate normal rv with mean vector 0 
and dispersion matrix a? I,» the MLE of £ is the same as the least square 
estimate. In fact, one can show that B, is the UMVUE of f; i = 1, 
2, ++, k, by the usual methods. 


Example 3. Suppose that a random variable X is linearly related to a 
mathematical variable a that is not random. (See Example 2.) Let Xy, 
X», +++, X, be observations made at different known values ау, ау, ---, а„ of 
a. For example, aj,.a5,.::, a, may represent different levels of a fertilizer, 
and Ху, X; ---, X,, respectively, the corresponding yields of a crop. Also, гу, 
Ex ‘**» En Tepresent unobservable rv's that may be errors of measurements. 
Then 
X; = Bo + bart в» i=l, 2 ery n 

and we wish to test whether B, = 0, that the fertilizer levels do not affect the 


yield. Here 
? í liq 
ker e 
1 a, 


B = (Bo i) and e = (£y Eg) 77, En). 

The hypothesis to be tested is Hy: 6; = 0 so that, with H = (0, 1), the null 
hypothesis can be written as Ho: Hf’ = 0. This is a problem of linear regres- 
sion. 


Similarly, we may assume that the regression of X on a is quadratic: 
X= & + pia Ба + 8, 


and we may wish to test that a linear function will be sufficient to describe 
the relationship, that is, Bp = 0. Here A is the п x 3 matrix 


lea a 
A= Е Bm | 
1 a, d 
B = (Bs Bis Bas. 6 = (є 69) 
and H is the 1 x 3 matrix (0, 0, 1). 
In another example of regression, the X's can be written as 
X= Ва + Bob + Bsc е, 


and we wish to test the hypothesis that В; = z = fs. In this case, А is th 
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aby сү 
аз b, c 
qaa bananas 


a, By ty 
and Н may be chosen to be the 2 x 3 matrix 


lo AS Е 
pon ( 1 =1 9): 

Example 4. Another important:example of the general linear" hypothesis 
involves the analysis of variance: We have already derived tests'of hypotheses 
regarding the-equality. of the means) of two-normal populations when the 
variances are equal. In practice one'is:frequently interested in the equality 
of several. means when the variances are the same, that is, one has'k samples 
from (ui, 033-5: hs g?) where 0^ is unknown and one wants to test 
Ho: ду = p2 =“ = д. (See Example 1.) Such a situation is of common 
occurrence in agricultural experiments. Suppose that k treatments are applied 
to experimental units (plots), the ith treatment is applied to n; randomly 
chosen‘units, i — 1, 2, --*, k, vr n; = n, and the observation ху, represents 
some numerical characteristic (yield) of the jth experimental unit under the 
ith treatment. Suppose also that uro 


Xy = hit ep =l Loer n i= 1, 2, «++, k, 
where e; are iid (0, 02) гуз. We are interested, in testing. Ho: = Me 
= +++ = д We write i 
X — (Xii Kian Xo K Mon ss Xong s Map Xi Xe 
В = Qus po s feed j j | s 


1,.0 o 
йр fies hae чк 
v Er : 
were 1„ = (1, 1, 255, 1) is the n-vector (i = 1,2, =+, К), each of whose ele- 
ments is unity, Thus A is n x К. We сап сһоозе veg & B0) « 
Be ee ыш А брат [Wah cs si 
Beli. rin e 
qe IT 


so that Ho: ш = pz = = Me iS ОЁ the form Hf’ = 0’. Неге Н is а 
(k — 1) x k matrix. › - ‚ ) 

The model described in this example is frequently referred to as а one—way 
analysis of variance model. This is a very simple example of an analysis of 
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variance model. Note that the matrix A is of a very special type, namely, 
the elements of A are either Q or 1. A is known as a design matrix. 


Returning to our general model 
" X= gÀ' X e 

we wish to test the null hypothesis Ho: BH’ = 0. We will compute the 
likelihood ratio test and the distribution of the test statistic. In order to do 
80, we assume that e has a multivariate normal distribution with mean vector 
0 and variance-covariance matrix g' ?L, where o^ is unknown and L, is the 
.nxnm identity matrix. This means that X has an n-variate normal distribu: 
tion with mean A’ and dispersion matrix ^L, for some В and some [Дд j 
both unknown. Here the parameter space @ is the set of (k 4--1)-tuples (8,7) 

= (By. Bo +s Be 2), and the joint pdf. of the X's is given by 


(8). Suet Xp ety Xn) = exp {=з Been meet Baa) 


gue = BA) (х =i BA} 


1 

(2ху'/ o" 
Ал шше 
Qa) о” 
Theorem 1. Consider the linear model 
X= ДА + г, 

where A is апл x k matrix Ka) i= 1, 2, =, п, j = 1, 2, +++, k, of known 
constants and full rank k < n, Bis a vector of unknown parameters i, 
Boy =", By, and е = (е, €2, 75 En) i isa Vector of nonobseryable independent 
normal rv's with common variance g^ and mean Ee = 0. The likelihood 
ratio test for testing the linear hypothesis Hy: BH’ = 0, where H is an 
rx k matrix of full rank r < К, is to reject Hy at level a if F > Fo, 
where Pj,(F > Fo} = a, and F is the rv given by 

А 
к -Дл)(х — дку - (X - BA) QE йау 
(X — BA) (X ~ BA’): 


In A Ê are the MLE's of 8 under Ө and Өү, respectively. Moreover, the 
ry [(n — k)/r] F has the F-distribution with (т, n — К) d.f. under Hp. 


в) | Ea 


Proof. The likelihood ratio test of Ho: Hf’ = 0; is to reject Hy if and only 
if A(x) < c, where 


Юр SF eX) 


(10) \ X9) = “sap Fak) fas): 
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9 = (B07), and Ө, = {(В, $^): Hp! = 0'}. Let 6 = (B, 2^) be the MLE 
of 0e Ө, and 6-= (8, 9") be the MLE of 6 under. Ну, that is; when BH'= 0. 
It is easily seen that A is the value of B that minimizes (x — BA'Xx — PAY, 
and ; 
a) a = nx — ВА) (x — BA". 
Similarly, B is the value of £ that minimizes (x — BA) (x — BA" subject to 
pH’ = 0, and 


(12) 8 = w(x — Ba’) (x — Bay. 
It follows that 

ni2 
(13) 19 =( Z) н 


The critical region A(x) < с is equivalent to the region (09) < (er "^ 
which is of the form 


(14) f 5 sn 


This may be written as 
(15) &- Boe - MAY, , 
(x = BA) (x = BAY 


or, equivalently, as 
Bat Bary — — BA' — BA^Y. 
ig 2 Ba) (кт BAY = (к - BAD astu ио 
(x — BA‘) (x — ВА? ‘ 

It remains to determine the distribution of the test statistic. For this 
purpose it is convenient to reduce the problem to the canonical form. Let V,, 
be the vector space of the observation vector X, V, be the subspace of V, 
generated by the column vectors di 85 ~> 2, of A, and V,., be the sub- 
space of V, in which EX is postulated to lie under Но. We change vari- 
ables. from Xj), Xo Xn 10. Zi Zo 7» Z» where 2, 2 <» Z, are inde- 
pendent normal rv's with common variance g and means EZ, = 0, i = 1, 
desk EZ = Em bm This is done as follows. Let us choose 
an orthoniormal basis of Ж — r column vectors {aj} for View say 
{а/+ь ario <*> Gat» We extend this to an orthonormal basis fa а ^» 
aj, ау, ~, ай} for V, and then extend once again to an orthonormal 
++, Qn} for Vy This is always possible. (See Theo- 


basis (ai, az `» ФЬ Gir 
rems P.3.2 and P.3.3. ; à s : " 
ТЫН, MAE e, thé ‘coordinates of х relative to the basis 
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'(ai, a2, 7, а}. Then z; = xa; and z = xP’, where P is an orthogonal matrix 
with ith row а, Thus EZ; = EX a; = BA‘ aj, and EZ = BA'P'. Since 
ВА“ e V, (Remark 1), it follows that. BA'a; = 0 fori > k. Similarly, under 
Hy, BA' € Vi., © Vp so that BA'a; = 0 for i < r. Let us write œ = ВА'Р'. 
Then орн = Фа = + = @, = 0, and, under Ho w = @2 = + = w, —0. 
Finally, from Corollary 2 of Theorem 5.4.6 it follows that 21, Zo, ---, Z, are 
independent ‘normal rv’s with the same variance 02 and EZ a; = 1, 
2, ~" п. We have thus transformed the problem to the following simpler 
canonical form: 


0: Z; are independent M(w; 02), i = 1, 2,---‚п, 
(17) Wnt = Wk = 7 = 09, = 0, 

Ну: w, = о = = wp = 0. 
Now · 
(18) | (x — BA’) (х — BA’) = (zP—«P) (zP — ФР), 


= (z-o) (2-0) 
bs Х (z; — oy + j È 27: 


The quantity (x – ДА”) (х — ВА”) is minimized if we choose @; = z;, 
i= 1, 2, -—., k, so that ! 
(19) (x ВА) - Ау T ox 

Under Ho фу = о = ++ = д, = 0, so that (x — BA’) (x — ВА'у will be 
minimized if we choose фу = z, i = r + 1, ---; K. Thus 


(20). («= Bay Bays Fate dos 
i=} ъ ў=А+І 
It follows that 
КА 244. 
F= 2 8р9 к 
2 
ixl 6 


Now y- rt 2{[0° has a y? (n — k) distribution, and, under Hy Xi 230° 
has a y'(r) distribution. Since X7; ., Z? and X? ,,, Z? are independent, we 
see that [(n — k)/r] F is distributed as F(r, n — k) under Ho, as asserted. This 
completes the proof of the theorem, — . ji di 


EE 4. In practice, one does not need to find a transformation that 
reduces the problem to the canonical form, ‘As will be done in the following 
sections, one simply computes the estimates 6 and and then computes the 


qur ——— i RR 


3 
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test statistic in any of the equivalent forms (14), (15), or (16) to apply the 
F-test. 

Remark ‘5. © The computation’ of Ê ĝis greatly facilitated, in view oP Re- 
mark 3, by usitig the principle of least squares. Indeed, this was done in the 
proof of Theorem 1 when we reduced the problem of maximum likelihood 
estimation to that. of minimization of sum of.squares (x — ВА’) (x — ВА"), 


Remark 6. The distribution of the test statistic under Н, is easily 
determined. We note that 2/0 ~ W(w,/o, 1) fori = 1,2, г, т, so that 
XL, 220° has a noncentral chi-square distribution with r d.f. and. non- 
centrality parameter 6 = D7_, 0/02. It follows that [(n — k)/r] F has a 
noncentral F-distribution with d.f. (r, n — К) and noncentrality parameter б. 
Under Ho, 6 = 0, so that [(n = K)/r] F has a central F(r, n — k) distri- 
bution. Since Xi, o? = LV, (EZ), it follows from (19) and (20) that, if 
we replace each observation X; by its expected value in the numerator of (16), 
we get с“0. 


Remark 7. The. general linear hypothesis makes use of the assumption 
of common variance. For instance, in Example 4, Xy ~ N (Hi 25, j = 1, 
2,«, ke Let: us suppose that у (us o), 1 — 1, 2, +, К, Then we 
need то test that’ ój d; = -= = ту before we can apply Theorem 1. The 
case k = 2 has already been “considered in Theorem 9.5.6. For the case 
where k > 2 one сап show that a UMP unbiased test does not exist. A 
large-sample approximation is described in Lehmann [70], pages 273-275. 
It is beyond the scope of this book to consider the effects of departures 
from the underlying assumptions. We refer the reader to Scheffé [111], 
Chapter 10, for a discussion of this topic. 


PROBLEMS 12.2 


1. Show that any solution of the normal equations (5) minimizes the sum of 
squares (X — BA’) (X — BA’)’. 

2. Show that the least square estimate given in (6) is an unbiased estimate of В. 
If the rv's X; аге uncorrelated with common variance 0°, show that the covariance 
matrix of the ĝ;’s is given by (7). 4 

3. Under the assumption that e [in model (2)] has a multivariate normal distri- 
bution with mean 0 and dispersion matrix 021, show that the least square estimates 
and the MLE's of В coincide. 


4. Prove statements (11) and (12). 


a£ 
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12.3 THE REGRESSION MODEL 


In this section we consider a simple linear regression model as a special case 
of the general linear hypothesis and show how some, inferential questions 
about the parameters of the regression equation can be answered, Let fy, 
1 7, f, be п given numbers, and suppose that... 

(1): X= + онг Hey 7 i-12,-,n 

where ao, a are unknown parameters, - є; are independent normal rv's 
with Ez; = O and var(e) = o^, i = 1,2, -«, m. Also, 0? is assumed to be 
unknown. Our object is to test FORCES ое, @ and о, and to con- 
struct confidence intervals for a) and ал. Rewriting (1) in the usual fashion, 
we have 


Q) шө га qui cep 


Where мды 
TUN 
В= (аһа) and A= [ | 
be T 


Clearly, X, X», -- р X, are independent normal rv's uh EX, = ay 4 E 
and var (X) = LIEN TS dre and X is an n-variate normal random 
vector with mean BA’ and variance 9” I,,.. The joint pdf of X is given by 
ах Deans SIP Aa INE a AT) 3 dac 
G) FO; av a, 2) € erp] 3 & (тв аш), 


It easily follows that the MLE's for a, a, and c? are given by, 


> › notesy 


eee X; | 
"У = a 
5 Wüc PU OBRA VLL MURS. 
; bi (t; — ty i 
and j 
(6) p -if Ea he te £ aits 


where i= n? DO" ty, 


If we wish to test Ну: a1 = 0, we take Н = (0, 1), ЧЫ ЗЫЛ Л 
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special\case of the general linear hypothesis with k = 2, r = 1. Under Ho 
the MLE’s are 


X; 

e gorse 
and 

ia : 
(8) P= rAr- xy. 
Thus v 

È- RP - BOG Ft ài аш) 

(9) = 11 i=l 


Ea = Š + âii t 
aE 0-1) 
iz 
216% - X + âai ty 
From Theorem 12.2.1, the statistic [(m — 2)/1] F has a central F(t, n — 2) 


distribution under Hy. Since F(l, n — 2) is the square of a t(n—2) rv, the 
likelihood ratio test rejects Hp if % 


azai y Y 
(10) lil Et’ tc 
У(Х - + âit — à) 


i=l 


where со is computed from t-tables for n — 2 d.f. 
For testing Ho: ag = 0, we choose Н = (1, 0) so that the model is again 


a special case of the general linear hypothesis. In this саѕё:) ( 


ў 2х, 
het : | 
Be | 
and 
an = EE UG - диў. 


It follows that 


EG = ât- Eo — F + ait - ati) 
(12) Fa ар 
BOG = R+ mi ё) 
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and since 
Bux, È (tD; I) + ніХ 
аа-а 
дб Le 
1 
ё P @ — 1° + ni(ày + ài) 
E ha 
i=1 
(13) =a + Âi, 
i x È 
т=1 


we can write the numerator of F as 


EG - à - È (%- d at) 


зр. \2 
= E(x- antar yr- а — A) 
i=1 t 
4 Zi i-1 


i У : = EG = Š + ât ât” 


S(r- кн тал) +2 $(®,— аш + at = X) 


(14) 
iz 
It follows from Theorem 12.2.1 that the statistic 
& -7 X 
аз) ovn Yr ТУУ Р 
VS ee dit — âit) fn — 2) 
has a central /-distribution with n — 2 d.f. under Ho: a = 0. The rejection 
region is therefore given by 
[л у {в DADE Fe 
Y XL. — a — aS ae -2) 


where со is determined from the tables of 1(n—2) distribution for a given 


level of significance a. 


(16) 


> су, 
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For testing Ho: a: = ау =,0, we choose Н = G 1) ‚ so that the model 


is again a special case of the general linear hypothesis with r = 2. In this 
case Я 


2с ly? 
(17) (== шү; 
апа 


Е , AX hec лка а 


Ier -Xxài — aut 
Л i$ 
nX? + âi 3 (ts = т) 
E Qd аи) 


п@ a + & EG 0) 
EG 40-00) 


From Theorem 12.2.1, the statistic [(n —.2)/2] F has. а central F(2, n—2) 
distribution under Ho: ag = a,.—/0. It follows that е level a 19р 
region for Hy is given by í 


(19) OSEE (a 


where F is given by (18), and co\is the upper а percent point under the 
F(2, n — 2) distribution. 


Remark 1. Iti is : quite easy to modify, the analysis oats to. obtain tests of 
null hypotheses a = a a = ai, and (a, а) = (оф, 01), Where a, ол are 
given real numbers (Problem 4). 


Remark 2. ‘The confidence intervals Ёог ао, a are also easily obtained. One 
can show that a 1 — a level confidence interval for a is given by 
m dore OS à — ati)” f 

n(n = 2) Xii (5 — 1° 


РУ И а T 
Go + 1„-2а/ ee 2 X (t; "m iy + 


(20) (às = by-2, 0/2, 


; ind fidis? a is given by * 


siò GENERERE BT 


=H eG - a = аў 


бу" i (à -. ел При й. 
) 3t. OG ay ви) 73 
BNE IR TA ет туи ттер бА 


(n-2) XL -» 

Similarly, one can obtain confidence sets for (a, a) from the likelihood 
ratio test of (a, о) = (аф, o1). It сап be shown that the collection of ‘sets 
of points (ao, ол) satisfying EE 

(n — 2н@ = ai  2nK&s — a) (ar а) + E rts = al 


(22) я 2 
221(®% — à — ёш) 


S Fayn-2.0 


is a 1 — а level collection of confidence sets (ellipsoids) for (a, o) centered 
at (до, à). 


Remark 3.. Sometimes interest lies in constructing a confidence interval on 
the unknown linear regression function E{X |to} = ag + ato for a given 
value of t, or on a value of X, given t = tọ We assume that tọ is a value 
Of t distinct from ty, t} --- ty Clearly ao ¥ Git i$ the maximum likelihood 
estimate of ag + ats. This is also the best linear unbiased estimate. Let 
us write E{X|fo} = @ + fo. Then 

E{X|to} = X — ài + âro: 


D(X,— X 
Шү, вр мы pai )( ) 
20-0 


which is clearly a linear function of normal rv's X;. It follows that Ё{ Х| |0) is 
also normally distributed with Mom E(& + dite) = a + айо: du variance 
var {E{X|t}} = Е(а — ay T ато — ato}? ; 
) = var (йу) + 45 var (а) + 210 cov (Bo, ё) 
WO bs а pre У," 
«d B E (t; 1 Wor : 


(See Problem 6.) It follows that 


(24) & H flo 00 — - dis 

eium) t = ууй; — y^ 
. is W(0, 1). But о is not known, so that»we cannot use (24) to construct a 
confidence interval for E{X|to}. Since nd*/o° is a: (n —.2) гү and nea 
is independent of ду + &f (why?), it follows that — 
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(25) Vani бо + йуу — a — ato 
a{l + ni — IE Gs — 09^ 


has a Қи — 2). distribution. Thus a 1 — a level confidence interval for 
ао + ato is given by 


T UB ТИРГИ ИНТ 
(2 A Mey ar ok PEE i ze А dt 
(26) ОА Las ЫСЫ, — 
ула 
вау ger 
jM 


In a similar manner one can show (Problem 7) that 


M PEENE 
(ё + oo — hei nE ш j* + ы T (жщ) 


n à 
(27) MCI WD 2 
бо + Gto + fn-2,a/2 on a Е z pi ue ü 1o) 7) 
у ПЕ 


is а 1 — a level confidence interval for хо = @o + ato + e that is, for the 
estimated value xo of X at to. 


Remark 4. The simple regression model (2) considered above can be gene- 
ralized in many directions. Thus we may consider EX as a polynomial in 
1; of a degree higher than 1, or we may regard EX as a function of several 
variables. Some of these generalizations will be taken up in the problems. 


Remark 5. Let (Xy, Yi), (Xo, Y2) =s (Xm Yn) bea sample from a bivariate 
normal population with parameters EX = uy, EY = ць var (X) = On var 
(Y) = ci, and coy (X, Y) = p. In Section, 7.6, we computed the pdf of the 
sample correlation coefficient R’ and. showed (Remark .7. 6.4). that the 
statistic : "iiw 


улги Wm 
Q8) TORIA 


has a (и — 2) distribution, provided that p = 0. If we wish to test p= 0; 
that is, the independence of two normal rv's, we can base a test on the 
statistic T. Essentially, we are testing that the population covariance is 0; 
which implies that the population regression coefficients are 0. Thus. we are 
testing, in particular, that a; = 0. It is therefore not surprising that (28) 
is identical with (10). We emphasize that we derived (28) for a bivariate 
Hórnial population, but (10) was derived by taking the t's (X's) as fixed 
and the distribution of X's(Y's) as normal. Note that for a bivariate normal 
population E(Y|x) = 2 +'p(o2/o) (х = jn) is linear, in consistency with 
our model (1) or (2). ‚дд: 
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Example 1. Let us assume that the following data satisfy a linear regression 
model: 
X; = ag + at; + &;. 
t 0 1 2 3 74 В 
х: 475 1.007 .838 —.618 1:378 .943. 
Let us test the null hypothesis that a, = 0. We have 


Reavis, dG рЫ, 671s 


Ў X io 8) = 9985, 
= 0871, бо= x = &t'=).5279, 
de =e = My = 2.3571, 
and 
ац 2204: 22) 0-0). 3106: 
E = âo — à) 
Since la-2,a72 = te, о = 2.776 > 13106, we accept Ho at level a = .05. 


Let us next find a 95 percent confidence interval for E{X|t = 7}. This is 
given by (26). We have 


eon dades de |= Alo Nae 2.3571 is (e "uoa 
223107, 
Bort ёлу = .5279 + .0571 х7 
= 9276, 


so that the 95 percent confidence interval is (— 1.4431, 3.2983). 

(The data were ptoduced from Table 6, page 659, of random numbers 
with = 0, c — 1, by letting ay = 1 and a, = 0 so that E{X|t} = 
ао + aıt = 1, which surely lies in the interval.) 


PROBLEMS 12.3 


1. Prove statements (4), (5), and (6). 

2.5 Prove statements (7) and (8). + 

3. Prove statement (11). 

4. Obtain tests of null hypotheses ау ^i ay «у = xis and (a, o) = (a, о), where 
ay à are given real numbers. 


5. Obtain the confidence intervals for a, and а; as given in (20) and (21), respec- 
tively. , 


hi 


f 
| 
| 
? 
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6. Derive the expression for var {Ê (Xlto}} as givem in (23). - >) 


7... Show that the interval given in (27) is a 1—a@ level confidence. interval;jfor 
XoF 0 fo +e, the estimated value of X at fo. 18 1109 
8. › Suppose that thé regression of X on the(mathematical) variable ais a quadratic’ 
X; = By # Bia; + Boa? + eN p oo 
1 ї А f 1 ИТО 
where ĝo, Aj, г are unknown paramaters, ау, a2, ···, а, are known values, of a,i 
and ey zx e, are unobservable rv's that are assumed to be independently 
normally distributed with common mean 0 and common variance g?. (See Ex- 
ample 12.2.3.) Assume that the coefficient vectors (af, aj, .-*, at), k = 0,1,2, are ` 
linearly independent. Write the normal equations. for estimating the 7з, and 
derive the likelihood ratio test-of 8;— 0; 


9. Suppose that the X's can, be written as 
ІХ; = Bray + Bobi + Bye +26 


1 и à d uim (£ 
where a,b,c are three mathematical variables, and e; are iid .4/(0,1) rv's. Assuming 
that the matrix A (see Example 12.2.3) is of full rank, write.the normal equations 
and derive the likelihood ratio test of the null hypothesis Hy : 6; =; = Bs: 


10. The following table gives the weight X (grams).of a crystal suspended ina’ 
saturated solution against the time suspended T (days). 4 


Time T: 0 1 2 ЗІМ sif 4 (5555 03 {26 vi tt 
Weight X: 4 1 1.1 1.6 1.9 2.3 2.6 


(a) Find the linear regression line of X оп T. à 
(b) Test the hypothesis’ that’ a;—0 їп the linear“ regression model Ху“ 


Qo + oa T; + 6j. 
{ = AM У 
(c) Obtain a .95 level confidence interval for ао. + { 1 Q) 


12.4 ONE-WAY ANALYSIS OF VARIANCE 


In this section we return to the problem of one-way analysis of variance con- 
sidered in Examples 12.2.1 and 1224. Consider the model ws 


а) Kips itep  ј = WB mid 2, ++, К, 

as described in Example 12.2.4. In matrix notation we write эш 
Q) xe? 0j 2i X A Бйз эйт LOLI msosdT ya 
where б 


X = (Xin Xia Xp Xor Ж s M. „Хы, Хун =, Хы), (0) 
В = (дь, по" ш), Ewa 


A= (^ Е 
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в = (£i E12 s улу) 820» 622 7779 Ens 7115 Ekls Ends 71 Ein): 
As in Example 12.2.4, X is a vector of n-observations (n = уу т), whose 
components X;; are subject to random error ё; ~ 4(0, g”), B is a vector of 
k.unknown parameters, and A is a design matrix. We wish to find a test of 
Ho: py = шо = + = ць against all alternatives. We may write Hy in the 
form Hf’ = 0', where H isa (k — 1) х k matrix of rank (k — 1), which can 
be chiosen to be 


rum 
Let us write pj = pz = ©- шу = p under Но. The joint pdf of X is given by 


) Зз) ^ fo; Hy Has 17 He а?) = С)" x È Ў (ху x uy). 


іў 
and, under Ho, by - ' 
4! f(x; и, o?) = © 1 вуд Lanne 
(4) уо; и, o^) [ome 25 E Ec u) } 
It is easy to check that the MLE's are 


г) 


X, t 
СЕ in -2. ия As 
(6) ore EE cr xig 
ме сы 
БИ 
o д- BET 
and ‘ d E ys 
e 
в) p= е» 
Ss crags S 


By Theorem 12.2.1, the likelihood ratio test is to reject Hy if 
o EAE T Bee д BEC NT =) 
Бау - ў E: 
where Fo is the upper а percent point in ће F(k T 1,n — k) distributior 


Since 


D 
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ай — £XO,-£f-$EQ,-3.R-iy 


#=1 j= 
= BE, - X + Хн, - xy, 
we may rewrite (9).as 


Xin. — ХУ К — 1) 

(11) P - 

EEG - Yo) 

It is usual to call the sum of squares in the numerator of (11) the between sum 

of squares (BSS), and'the sum of squares in the denominator of (11) the 

within sum of squares (WSS). The results are conveniently displayed in a 
so-called analysis of variance table inthe following form. . 


2 Fo 


One-Way Analysis of Variance 
Source of Degrees of. Mean sum 
Variation Sum of Squares Freedom of Squares F-Ratio 
^ n BSS/(k — 1 
Between BSS = X (5, XY E-1 (OBSS -D WS) 
Within WSs = 3 Ë Ore nko WSS) 
Mean nX* 1 
м 
Total TSS nds : : п 


The third row, designated "Mean," has been included to make the total’ 
of ithe second column add up to the total sum of squares (TSS), Из. Zh 
ij 


Example 1. The lifetimes (in hours) of samples from three different brands 
of batteries were recorded with the following results: 


Brand 
X X 2 
40 М 60 60 
30 ` : 40 50 
50 à 55 70 
50 65 65 
30... > 75 


Ме wish to test whether the three brands Have different average lifetimes. 
We ‘will’ assume that the three samples come. from normal ‘populations 
With comnion (unknown) standard deviation c. à 
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From the data nj = 5, п; = 4, m = 6, n = 15, and 


= 360 L 
= 20 = 60, 
8 CI 
(z, — £y. = 850. 
1 
| к 200 + 220 + 360 _ 70, 57, 
ae PAM Gh WS 
BSS. = 5(40.— 52) 4(55.— s + 660 —(52)2 
f pid „= 1140, 
PES WSS = 400 + 350.4 850 =\1600,> 
A Analysis of Variance 
Source 55 d.f. MSS. F-Ratio 
BL D SIME MSS ES ано quce 
Between 1140 2 510 — 570/133.33 = 4.28 
=~ Within 1600 12 133.33 


4 (emp = .05, we see that Fo = Ез; — 3.89. Thus we reject Ho: 
^od = ш = p at level a = .05. . 


5 Example 2. Three sections of the same elementary statistics course were 


taught by three instructors. The final Ка of Sine were recorded as 
follows: í 


ат? 


} £a 
TOP “Instructor ' 


Е = 10, 79, no 27, 270, p 74, 2 = 66) 


Dhar Оч — 3). = 3168, D2, (у, — Ў)? = 3686, X3. (г; — 2)? = 4898. Also 
the grand mean is sie’ {щч сїпитоо бій 


60230) 


‚ Should the consumer accept his friend's advice? 
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560-& 740-+-594- 1894 <4 18! 


К 27 10007 юй yelqmas 
Thus 
BSS = 8(.15) + 10(3.85)° + 9(4.15)° 
= 303.4075, X 
WSS = 3168 + 3686 + 4898 
= 11,752. 
Analysis of Variance 
Source. $$ df.. MSS... F-Ratio. . 
Between 303.41 2 151.70 151.70/489:67 © 
Within — 11,752.00 24 489.67 


We therefore cannot reject the null hypothesis that the average iin ziven 
by the three instructors are the same. 


PROBLEMS 12.4 


1. Prove statements (5); (6); (7); апа (8): 


2. The following ate the coded values of the amounts of corn (in ‘bushels 
acre) obtained from four varieties, using unequal numbers of plots for the different 
varieties: 

HOUR E ly ef 2454 

B: 3, 4, 2, 3, 4,2 ^ 

Cc 6, 4, 8 

D: 7, 6, 7, 4 - í 
Test whether there isa significant difference between the yields of the varieties. ` 


3. A consumer interested.in buying a°new car has reduced his search to ‘six 


lis »no È 


different brands: Р, Е, G, P, V, T. He would like to buy the brand that gives the · 


highest mileage per gallon-of regular gasoline. One of his friends advises him that 
he should use some other method of selection, since the: average: mileages of: the 
six brands are the same, and offers the.following data in support: of his assertion: - 


Distance Traveled (miles) per Gallon of Gasoline ^ 


Brand E 
Car D + (F: Gi АР! Cn) Kis TX 
[ШЕЕ УУ: 38 . 28 32 +-ц #30. E ENS 
2 35.71 33. 32.213 036 00 850 а 55 3274 
3 47 8 7738 35 21. 0-28. ier ved 
4 sit 20) 9045 Ce rss aio atio eR 
5.7 à 2 dc жи Ж НЫМДОО 9 Я 
6 19° 
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4. The following data give ће ages of entering freshman in independent random 
samples from three different universities. 


University 
A : B С 
Б; 17 16 21 
19 16 23 
20 19 22 
21 à 20 
18 19 


Test the hypothesis that the average ages of entering freshman at these universities 
are the same, | 


125: TWO-WAY ANALYSIS OF VARIANCE WITH ONE OBSERVATION 
PER CELL 


In many practical problems one is interested in investigating the effects of 
two factors that influence an outcome. For example, the variety of grain 
and the type of fertilizer used both affect the yield of ja plot; or the score 
on a standard examination is influenced by the size of the class and the 
instructor. 

. Let us suppose that two factors affect the outcome of an experiment.’ 
Suppose also that one observation is available at each of a number of levels 
of these two factors. Let Xy (i = 1, 2, 7545 j = 1, 2,---, b) be the observ- 
ation when the first factor is at the ith level, and the second factor at the 
jth level.. Assume that 


Mi My ао B uscito OAE E аЬ, 

where а, is the effect of the ith level of the first factor, Bj is the effect of the 
jth level of the second factor, and e; is the random error, which is assumed 
хо be normally distributed with mean 0 and variance o°: We will assume 
that. the e's аге independent. It follows that X;; are independent normal 
IV's' with means & +в + B; and variance o^. There is no loss of generality 
in assuming that. Ziz, а; = 227.18; = 0, for, if u; = “+ a; + B, we can 
"write 

Hi = Ш a + В) + (i-a) В) 
= + a +8, 

and jj, a; = 0, j-r; = 0. Here we have written a’ and B' for the means 
of a's and 67s, respectively. Thus X;; may denote the yield from the use of 


the ith variety of some grain and the jth type of some fertilizer. The two 
hypotheses of interest аге ~ 
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ар = %=-=a,=0 and Б айв 
The first of these, for example, says that the first factor has no effect on the 


outcome of the experiment. mue 
In view of the fact that У a; = 0 and D., 8, = 0, a, = — Dita, 


B, = — Xii В» and we can write our model in matrix notation as 
(2) X 5 fA +e, 
where | 


X = (Gp Xi s Xin Xor Ху» +, Xo s Xan Xan s Жш), 
B = (а Gay 7 р-у, By Bas 7 Вь-1), 


© = (en E12 * Eros 6015 €225 71$ Caps 77» Eats Eads 77» ба), 


and 2 i 15 hx 
B 61 
1 1 0 0 1509 ДД 
1 1 0 0 OF 0 
1 1 ‘0 0 о 0 1 
1 LN 0| -1 -1 71 
1 0. 1 0 You ou 
I Overt Q7 g "9 0 
A= so 
1 О 0 0 0 
1 0. daph t ooi il WA 
1 Ela-ly es -1 ТЕУ 09D 
1] <1, —1_ ce 1 0. 4h. ey 00У 00. 
коер роу гай € 0 9 ew 
Uf Hy Saree bobs ЕУ ЕЕ 


The vector of unknown parameters f is 1 x (а + b — 1), and the matrix 
A is ab x (a + b — 1) (b blocks of a rows cach). We leave the reader to 
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check that. A is.of full rank, а + b. — 1. The hypothesis Нау = az = + 
= 0, or Hg: В = Вг = т: = Br. 0, can easily be put into the, form 
0. For example, for H; we can choose H to be the (b— D x(a+b—1) 
matrix of full rank b — 1, given by 


io ШЫЙ M an asi gy Brat ea 
SAET ЫНАК C1. 0 1 0 
CHOU MO НА Г. Уб dH о 
H- "es: i». t я c Re^ 4 
УАДЕ Л s 1088 ine Qu 10 oer пу n 


Clearly, the model described above is a special case of the general linear 
hypothesis, and we can use Theorem 12.2.1 to test Нр. 

To apply Theorem 12.2.1 we need-the estimates Ду and Åj- It is easily 
checked that 


a b 
M SEED . 
| f " ab р 
апа, yt Ü ) | 6 ( 
Чч) ' тало 4, B-x;-5 
where x,. = DU EI хь, х, = Xa х,а. Also, under Hg, for example, 
6. Ê=3, and a; = 5. -—% , 


In the notation of Theorem. 1221, n=abk=a+b-1,r=b-1, 
so that n = k = ab —a— b + 1 = (a — 1) (b.— 1), and 


ot 
hoes AA Rho BE 0 cod ud 


y 


P E - dis Еў 
Since _ F 4 | 
o Ё б^ y = ERU, - 3-595), - ny 
a Б 
| = BE 0-Х. - Ky Xy +а2(%,- Xy, 
we may write dos Ub Ps 
uid aX Qt, Ж) 
9. ? ЕО ад нано дода 
о) к ў j ДАФ X. — Ray + xy 


"V 
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It follows that, under Li (а. — 1) F has a central F(b — 1, (а —.1)@;— 1)) 
distribution. 

The numerator of F in (8) measures the variability between the means 
Ж.» and the denominator measures the variability that exists once the effects 
dius to the two factors-have been subtracted. => 8 { 

If H, is the null hypothesis to. be tested; one: can show that ander Hi me 
МІЕ are 8,2102 


(9) f=% у апа + downey vae lewd x 
As before: nea; k=a+6 = 1, butr=a— 1, Also, 
a b ^ < 1 
EB Ky RY HE BG j- fe d; BF 
(10): F= S j=) | f=1 j=l вы trie 
Aha c.f чш 


which may be rewritten as 


bà xy 
(11) ang rcm iae rip à : ашай. 
" AN t жкен Ж у Ж lo boi); 


It follows that; under H, (b= 1)F has'a central F(a. aq. G. — 1) "T =1) 
distribution. The numerator of F in (11) measures the:variability between ` 
the means X;.. : : 

If the data are put into the following form: 


. Levels of Factor 2 
Pus od b ^| Row Means 
ESTNE TU VU ud i 
i Levels >! 2 ү'1Х д, X» 
оба er iho à i 
Factor 1 


a Х.Ж» сз Ж» 
Column Means o| X X НЬ | X 

so that the rows represent various levels ‘of шг, 1, and. the columns, the . 

: levels of factor 2, one can write 


Бегине sum of squares for SET = T ba.- Eu 


= am nd squares for factor 1 . 
2 Ss 
Similarly А xe 
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Between sum of squares for columns = a PACS = zy 
= sum of squares for factor 2 
= SS; 
It is usual to write error or residual sum of squares (SSE) for the denominator 


of (8) or (11). These results are conveniently presented in an analysis of 
variance table as follows. 


Two-Way Analysis of Variance Table with One Observation per Cell 


Source of Sum of ’ Degrees of Mean F-Ratio 
Variation Squares Freedom Square 

Rows Ssi m 4—1 MS, = SS,/(a — 1) MS,/MSE 
Columns: SS, b-1 MS, = SS,/(b — 1) MS,/MSE 
Error SSE (a—1)(6-1) MSE = SSE/(a — 1) (5 - 1) 

Mean ab X* 1 ab X* 

Total nx ab Xx хуа 


isl j=l fal /=1 


_ Example 1. The following table gives the yield (pounds'per plot) of three 
* varieties of wheat, obtained with four different kinds of fertilizers. 


d Variety of Wheat 
4 "Ferülizer ^ ^ 4 -1B б 
а RAS 1 
B TOS At 8 
TS soin b. 5 6 
à Br 14 1 


Let us test the hypothesis of equality іп» the average yields of the three 
varieties of wheat;and the null hypothesis that the four fertilizers are 
equally effective. 
` Inouraotation, b = 3,a = 4, x). = 6, Ӯ). = 7.33, ху. = 5.67, A, = 6.33, 
‚5а = 8,50 = 4, Fg = 7, x = 6.33. ў ‹ 
Also, E 


e . 


» 7 SS; = sum of. squares due to fertilizer 
= 34.33) + 1° + (.66) + 07] 
= 4.67; " : 


SS, = sum of squares йе fo variety of wheat 
= 41.67)? + 2.33). + (.67)] 
= 34.67; 


E Ag Мы 
SSE = M (х= x. x4 + x) 
ESSI 
= 7.33. 
The results are shown in the following table.- 
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Analysis of Variance raat 
Source 55 d.f. MS F-Ratio 
Variety of wheat 34.67 s] 17.33" 1220 Pap * 
Fertilizer 4.67 3 1.56 1.28 
Error 7.33 6 1.22 
Меап 481.33 1 481.33 
Total » 528.00 12 44.00 


Now Fo,¢,.05 = 5.14 and. F3,¢,.05 = 4.76. Since 14,2 > 5.14, we reject Hs, 
that there is equality in the average yield of the three varieties; but, since 
1.27 > 4.76, we accept H,, that the four fertilizers are equally effective. 


PROBLEMS 12.5 


1. Show that the matrix A for the model defined in (2) is of full rank, a+ b- 1. 
2. Prove statements (3), (4), (5) and (9). 


3. The following data represent the units of production per daj turned out by 
four different brands of machinés used by four machinists: 


Machinist : 

Machine. A, Ag vs ion Ан Сулай} беп * 
B, 15 14 19 18 
BA 17 12 20 16 
В, . 16 18 Б а 97. 

В, 16 16 15 15 


Test whether the differences in the performances of the machinists ke significant 
and also whether the differences in the performances of the four brands of ma- 
chines ate significant. Use a = .05. í 

4. Students were classified into four ability groups, and three différent (Че 
methods y were employed. The following table gives the means for the four groups: 


х 


Teaching Method 


bility 

1 15 19 14 
2 18 17 12 
3 22 25 17 
"4 17 21. 19 


Test the hypothesis that tbe teaching methods yield. the same results. That is, 
that the teaching methods are equally e effective. © 
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12.6 TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 


The model described in Section 12.5 assumes that the two factors act inde- 
pendently, that is, are additive, In practice this is an assumption that needs 
testing. In this section we allow for the possibility that the two factors might 
jointly affect the outcome, that is, there might be so-called interactions. More 
precisely, if X;; is the observation in the (i/)th cell, we will consider the 
model 

(D ms Xj раа + Bj T, 3 Ер, 

where a; (i = 1, 2, ---, a) represent row effects (or effects due to. factor 1), 
BG 4 1, 2, ..., b) represent column effects (or effects due to factor 2), and 
T;; represent interactions or joint effects, We will assume that є,, are inde- 
pendently (0, o*). We will further assume that 


a b a 
OQ фа=о= 56, аа È Ty=0 for all , È fy =0 
tu & aM ё T- ] & 

c i for all j. 
The hypothesis of interest is ie 
(3) * Нотт = 0°! for all i; j. 
One may also be interested in testing that all a's are 0 or that all B's are 0 
in the presence of interactions 7;;. гж : Ч 

We first note that (2) is not restrictive since we can write 
T My = MF at B +7 + er, 

where oj, B; and 7;; do not satisfy (2), as 
EARL аат = roa (B- Bow LT) 
e Я tS — deo. Т) + 
and then (2) is satisfied by choosing 
К ; i : E M i u +a + p + f 


" 


м= а-а ty ү, 


+ -= 2 ^ [2 1 b ^ 
— 9-e€4é BEER nl»nfBn 
Ё 2 РУ 
MONT ад Fh = (aby ӱ Tp 
iI і j=1 + 


gis Next note that, unless we replicate, that is, take more than one observation 


H 
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per cell, there are no degrees of freedom left to estimate the error SS. (See 


Remark 1.) 

Let X;;, be the sth observation when the first factor iS at the ith level, and 
the second factor at the jth level, Ё = 1, 2, +--, apj = 1,2, --+,b, s = 1,2, ++, 
т (> 1). Then the model becomes as follows: Y ^ 


(0071 — ‘Levels of Factor 2 
Levels of ( f 
Factor 1 ' 1 wa b 
1 xu Xm рав 
ә хит Хіт Хіт 
2 X211 Xi Xm 
X21m ^22м T Хт 
9) 
а ха Хад 3a Хай 
i231 гі ud 
t ead ат. 
di bn i owl 
на АГУ pi Жаша (i Хао) Г-У Хат 
(4) Xj, = n a B; Т, + ein 


i= 1,2, ,a,j = 1,2, + b; and = 1, 2, => m, where; sare independ- 
ent 000, o°). We assume that 5%, a; = 5083 = Dial = 0= уш. 
Т. Suppose that we wish to test Hg: 0 = 02 =: = a= 0. We leave 
the reader to check that model (4) is then a special case of the general linear 
hypothesis with п = abm, К = ab; r= a — 1, and n — k=ab(m—-1). `” 


Let us write 
a b т Lj 
Re d 
2 iL Rr ertt Va 
7 ^ ijs hh tijs 
go-4m Ug ee 
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Then it can be easily checked that $ 
pah=k, а&=Җ%.-Х, Bo-Bpií.—k 
© 6 = Py = Xin — Ree Хр. +, 
It follows from Theorem 12.2.1 that 
D-E E Kip рн Ree — RY - DE E Kin RY 
= tf i 


Jos 


F 
м BELG, - yF 


Since 
EE E Xin- Rip + Roe — ky : 
= L E Xp- Ey D L LG. - Xy, 
we can write (7) as 
bm X (Х,.. — Xy 


Under H, the statistic [ab — 1)/(a — 1)] F has the central F(a — 1, 
ab(m — 1)) distribution, so that the likelihood ratio test rejects H, if 


2 
a ab(m - 1) mb Y (X. — X) ar) 
бтн LZZLG,- by un 

A similar analysis holds for testing Hy: В = 6, = --- = б, 

Next consider the test of hypothesis Ну: Т = 0 for all i, j, that is, that 
the two factors are independent and the effects are additive. In this case 
п = abm, k = ab, r = (a — 1) (b — 1), andn — k = ab(m — 1). It can be 
shown that 
(10) B=, R-X.-X апай, = Rp R. 

Thus 


(8) 


BEE Kin > Kin By + 2) уу E Ky — Ky)? 
(Чун = и ы 
ZZZQG-Xy 


Pj 
Now 
Ig EG; =. = + zy 


x EEEn- Жы + Rip — By. Rj + RY 
= EE En- kil + BELG. = kn Rip + BH, 
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so that we may write 


(12 Е Чд 
орно Бу 


Under H, the statistic {(m — аһ [а= 1) (6 — 101) F bas the F((a — 1) 
(b — Ty abm = 1) distribution. The likelihood ratio test rejects H, if 


(mab "P Bhp - Rin — GES RP ie 
G-06-D ER E Kn А 


Let us write 


с. 


(13) 


SS, = sum of squares due to factor I (row sum of squares) 
= bm 5a Se 
SS; = sum of squares due to factor 2 (column sum of squares) 
i 2 
-am D Ay дё)» 
Р Le 
SSI sum of squares due to interaction 
a b 
- mE E (ky Ru = Жу Hk) 
and 
SSE.= sum of squares due to error (residual sum of squares) 
ab m 
ah EE in kin 
#=1 j=1 s—l 
Then we may summarize the above results in the following table. 


Two-Way Analysis of Variance Table with Interaction 


Source of Sum of “De: f Mean ^ 

Variation SURE Fradom Square FRatio 
Rows 55, a-1 MS, = 8S,/(@ – 1) М5, /МЅЕ 
Columns- SS, b-1 MS, = SS,/(b — 1) MS,/MSE 
Interaction 551 (a — (b – 1)... MSI = SSI/(a — 1Xb— 1) MSI/MSE 
Error SSE ^ akm-U MSE= SSE/ab(m - 1) 
Mean abmX? 1 abmX* 

a b т a b т 
та LZ E Xip abet E pe Хат 


Remark 1. Note that, if т = 1, there are no d'f. associated with thg SSE. 
Indeed, the SSE = 0 if m = 1. Hence we cannot make tests of hypotheses 
when m = 1, and for this reason we assume ™ > 1. The more general case 
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in which we take unequal number of observations per cell is quite easily 
treated along the same lines. 
1 Es 


Example 1. To test the effectiveness of three different teaching methods, 
three instructors were randomly assigned 12 students each. The students were 
then randomly assigned to the different teaching. methods and were taught 
exactly the same material. At the conclusion, of the. experiment, identical 
examinations were given to the students with the following results in regard 


to grades. Y 


Instructor 

TP i п ш 
d£ wen) 951001. 6020р: To ruga 86 
85 90 77 
74 80 ты 
КӨРҮ Ме 70 is 10 
2 90 89 83 
80 “90 ' = 70 
92 ub Dade 75 
82 86 72 
3 “70 » 68 74 
80 73 86 
85 78 91 
{ aue Ie 85 593 89 

From the data the table of means is as follows: . |” 
ale USES Ў ; Xj e Ris 
ё 82 75 77 78.0 
80133 19 1 49962 ів 8937 їс STINA va .83/3 
«шя с Й 5 HU лым 78 кшыңой5 81.0 

Bn 32]: 800-777739,9 777, go 
Tben (= deg 4 А 7 ња 
= Sum of squares due to methods 


= bm E (gs. — х) 
m3x4x1413- 169.56, 
55, = sum of squares due to instructors ^ 
{28 ат Ec = xf ЖАК ЫЛ! ato 


0. = n 


on m ылу: Я 3x 4 x 6.86 = 82.32, 1 mi ( 
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SSI — sum of squares due to interaction 


ЕЕ 5 
=m EG Ri А) 
Rie 
=4 x 140.45 = 561.80 
SSE = residual sum of squares 
3 3 4 
= Ў È D (ij m iy) = 1830.00. 
pl /=1 351 > 


Analysis of Variance 


Source 55 df. MSS F-Ralio 
Methods 169.56 2 84,78 1.25 
Instructors 82.32 2 41.16 .61 
Interactions 561.80 4 140.45 2.07 
Error 1830.00 27 67.78 


With a=.05, we see from the tables that Fo 97, 95 = 3.35 and Е, 97, 95 = 
2.73, so Ња ме cannot reject any of the three" hypotheses that the three 
methods are equally effective, that the three instructors are equally effective, 
and that the interactions are allio: 
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1. Prove statement (6)... . ; 
2. Obtain the likelihood ratio test of the hull hypothesis n: oj dear к кышу 
3. Prove statement (10)... ч d ) 

4. Suppose that the following: data ба улус units of зніс кнн ош 
each day by three different machinists, each working оп the same machine for three 
different days: ; g bolt 


„a Machinist : 


Machine 004 B d С. 
"Bysidr 15,15,17 | 19,19,16 16,18,21 
В, Ў 1917,17 1350: 15:1515: 19,22,22 
By: es 15,17,16 ‚ 1847,16 18,18,18 
B, iznos 18,20,22: 15,16,17, 17,17,17 


Using a .05 level of а, test whether (а) the differences among the machin- 
ists are significant, (b) the differences among the machines are significant, and (c) 
the interactions are significant. дё? 


CHAPTER 13 


Nonparametric Statistical 
Inference dicen 


13.44 INTRODUCTION 


In all the problems, of. statistical inference considered зо far, we assumed 
that the distribution of the random variable being sampled is known except, 
perhaps, for some parameters. In practice, however, the functional form of 
the distribution is seldom, if ever, known. It is therefore desirable to devise 
some procedures that are free of this assumption concerning distribution. In 
this chapter we study some procedures that are commonly referred to as 
distribution-free or nonparametric methods. The term “‘disttibution-free” refers 
to the fact that no assumptions are made about the underlying distribution 
except that the distribution function being sampled is absolutely continuous 
or purely discrete. Thé term “nonparametric” refers to the fact that there are 
no parameters involved in the traditional sense of the term “parameter” used 
thus far. To be Sure, there'is a parameter which indexes the family of abso- 
lutely continuous df's, but it is not numerical and hence the parameter set 
cannot be represented as a subset of @,, for any n > 1. The restriction to 
absolutely continuous distribution functions is a simplifying assumption that 
allows us to use the probability integral transformation (Theorem 5.3.1) 
and the fact that ties occur with probability 0. 

Section 2 of this chapter is devoted to the problem of unbiased (nonpara- 
metric) estimation. Sections 3 through 5 deal with some common hypotheses 
testing problems. In Section 6 we investigate some applications of order 
statistics in nonparametric methods. Section 7 considers underlying assump- 
tions in some common parametric problems and the effect of relaxation in 
these assumptions. = 
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13.2 NONPARAMETRIC ESTIMATION 


We have already encountered some results in nonparametric estimation. 
For example, the sample distribution function defined in Section 7.3 as an 
estimate of the population df is distribution free, and so also are the sample 
quantiles defined in Section 7.3) as estimates of corresponding, population 
quantiles. To estimate moments of a distribution one can use the. method of 
moments described in Section 8.6. In this section we concentrate mainly on 
nonparametric unbiased estimation. We will see that some obvious generaliza- 
tions of concepts used in Section 8.4 yield an analogue of the Lehmann- 
Scheffé theorem. ; 

Let X,, Xp «++, X, be iid rv's with common law (X), and let 2 be the 
class of all possible distributions of X that consists of the absolutely contin- 
uous or discrete distributions, or subclasses of these. 


Definition 1. A statistic T(X) is sufficient for the family of distributions 2 
if the conditional distribution of X, given T = #, is the same whatever the 
true Fe 2. 


Example 1. Let X, X5, X, be a random sample from an absolutely 
continuous df, and let Т = (Ха, “s Xim) be the order statistic, Then 


ут =. = (тул, 


and we see that T is sufficient for the family of absolutely continuous 
distributions on 2. 


Definition 2. A family of distributions 2 is complete if the: only unbiased 
estimate of 0 is the zero function itself, that is, 


Е,ҚХ) = 0.; forall Fe P = Wx) -—0 
for all x (except for a null set with respect to each Fe 2). 


Definition 3. A statistic T(X) is said to be complete in relation to a class 
of distributions 2 if the class of induced distributions of T is complete. 


We have already encountered many examples of complete statistics “or 
complete families of distributions in Chapter 8. 


The following result is stated without proof. For the proof we refer to 
Fraser [33], pages 27-30, 139-142. ` Yi 
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Theorem 1. The order statistic (Ха, Хо), +++» Xun) is a complete sufficient 
statistic provided that the iid rv's Xj, X2,---; X, are of either the discrete or 
the continuous type. 


Definition 4. А real-valued parameter g(F) is said to be estimable if it has 
an unbiased. estimate, that is, if there exists a statistic T(X) such that 

(0). Ep Т(Х) = g(F) бога Fe 2. 

Example 2. If 2 is the class of all distributions for which the second 
moment exists, Y is an unbiased estimate of (Е), the population mean. 
Similarly, uX(F) = varg(X) is also estimable, and an unbiased estimate is 
52 = XXX, — XY (п — 1). We would like to know whether X and 52 аге 


UMVUE's. Similarly, X — Y is an unbiased estimate of EX — EY, and (1/n) 
:(number of X's > c) is an unbiased estimate of Pr{X > c), and so on. 


Definition 5. The degree m(m > 1) of an estimable parameter g(F) is the 
smallest sample size for which the parameter is estimable, that is, it is the 
, smallest n such that there exists an unbiased estimate T(X;, X», --:,.X,) with 


ЕТ = g(F) forall Fe2. 


Example 3. The parameter g(F) = Р„{ X> c), where c isa known constant, 
Ваз degree 1. Also, (Р) is estimable with degree 1 (we assume that there is 
atleast one Fe 2 such that (Е) + 0), and Р) is estimable with degree 
m = 2; since (Е) cannot be estimated (unbiasedly) by one observation only. 
At least two observations are needed. Similarly, (Е) has degree 2. 


Definition 6. An unbiased estimate of a parameter based on the minimum 
sample size (equal to degree m) is called a kernel. 


Example 4. Clearly X, is a kernel of p(F); XX, i +j, is a kernel of АР); and 
each 


TOR X= Xi- XX» — be WB vn» 
:ds a kernel оѓ Р). 
Lemma 1. There exists a symmetric kernel for every Sibi parameter. 
i Proof. if Т(Х\ Xo, “+, Xp) isa kernel of g(F), so also is 
Q TAXs Xo s Xn) = Gp E Tos Xin Xu) 


where the summation P is over all m! permutations of (1, 2, --* m). 
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Example 5. A symmetric kernel for (ЕЁ) is 
T(X, X) = E (T(X, X) + T(X;, X)) i 
=+®%-Х/, jS 1,2, mG Fd). 


Definition 7. Let g(F) be an estimable parameter of degree m, and let 
Xy Xp 24 be a sample of size n, n > т. Corresponding to any kernel 
T(X;, +, Xim) of g(F), we define a U-statistic for the sample by 


пу? 
6) Ut, Xo s X) = (m) E T9 Xip X29. 


where the summation C is over all.(7) combinations of уп integers 
(й, io, +++) in) chosen from (1, 2, ---, л}, and T, is the symmetric kernel 
defined in (2). 


Clearly the U statistic defined in (3) is symmetric in the X;s, and 
(4) E&U(X) = g(F) ~ for all F. 


Example 6. For estimating (Е), the U-statistic is n^! 377 X,. For estimating 
HAF), a symmetric kernel is 
T(X, X9) = (Ху = X, © ty = 1, 2, ni * i) 
so that the corresponding U-statistic is 
3 
и) = 0 9 Bg =) 

ISP fo- £y 

= 5°, 
Similarly, for estimating (Р), a symmetric kernel is'T,(Xi,, Xi.) — Xi Xip 
and the corresponding U-statistic is 


1 
C Cath ‚Ху. 
UX) (3 XX, = wT ту Eee 
2 " 
For estimating (ЕЁ), a symmetric kernel is bite x, Xi) = Xi, X, X4 80 
that the corresponding U-statistic is 
оо) = (1) BEE xx 


= FOS ve УЛ. 


The following result shows the importance of the U-statistic. 
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Theorem 2. Let # be the class of all absolutely continuous or all purely 
discrete distribution functions on 2. Any estimable function g(F), Fe 2, 
has a unique estimate that is unbiased and symmetric in the observations 
and has uniformly minimum variance among all unbiased estimates. 


Proof. Let Xi, X» ---, X, be a random sample from F, Fe 2, and let T(X;, 
X» s, X, be an unbiased estimate of g(F) Consider the set of all n! 
permutations of (1,2, --- , п} and suitably index them. Let fiy i» 7 in} 
be the ith member in this set, and let 
Tj = TAX), X ^ X,) = ТОК Xin ^ Mi), ^ im 52, nt 
Let T= Xj", T;/n!. Clearly EFT = g(F), and 
nt 2 2 
var (T) = Efan 5 т) — EE) 
т E 
s E(E ay? E тї} (Р)? 
ЕТ? — (gF)? 
var (Т). 
The equality holds if and only if 


TX, Xo —, X) = cdm laorem 


for all points in the sample space (except perhaps for a null set) where a 
is a constant. It follows that T(X) is symmetric in the arguments X; 
X; ^, X, with probability 1 and T is identical with T. Uniqueness follows 
from Theorem 1 and Problem 4. 


Corollary. If T(Xj X» ‘++; X,) is unbiased for g(F), Fe 2, the corres- 
ponding U-statistic is an essentially unique UMVUE. 


Remark 1. According to Theorem 2, we need only consider estimates that 
are symmetric in the observations, and all we need to do is to make them 
unbiased. This procedure leads to an unbiased estimate with the least 
variance in the class of all unbiased estimates of the parameter. For exam- 
ple, as a consequence of Theorem 2 ¥ and S^are unique UMVUE's of 
ЖЕ) and д„(Е), respectively. 


Kemark 2. The reader should go back and compare this result (Theorem 2) 
with the corresponding result in the parametric case (Theorem 8.4.5). See 
also Remark 8.3.12. 
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Example 7. Let 2 be the class of all absolutely continuous df's, and 


X4, Xo, +++, X, be a sample of size n. To estimate g(F) = P,{X, > c), where 
c is a fixed constant, define ] ‹ 


Consider 
TY, Yor, Yo) = È eio 
as an estimate of g(F). To find the UMVUE of g we symmetrize T in the 
Y's. This happens if a; = a, i = 1, 2,--:,,, and T(Y) = a У," Y; For T to 
be unbiased, EET. = а У," ЕкҮ; = ang(F), so that a = пті. Thus nis Y; 
is the UMVUE; з i 
la Ut = FY) l1 
varp (Т) mere S 


Also, Y; is binomial, so that 
a^ [T-gF)- 1 
BEIL- sy"? 
where Z ~ .#(0, 1). This result can be used to find confidence bounds on 
g(F). у K өк 4 
Example 8 (Lehmann [68]; 3-24 to 3-26, and Fraser [33], 164-167). Let 2 be 


the class of all absolutely continuous distributions on the real line. For F, G 
є 9, we define a distance function A (F, G) as follows: 


© 4. бу=[` (r9 - Gat 20500 ax 


We show that AF, G) = 0 if and only if F = б. Clearly, ҖЕ, F) = 0. 
Conversely, let F(x) # G(x) for some д}, and suppose that F(x) — 
G(x) = d > 0. Since F and С are (absolütely) continuous df's, there exists 
an хо < x, such that F(x) — GG) = dj2 and F(x) —.G(x) > d/2 for 
ху <x < xı Since F and Gare both nondecreasing, at least one of F and 
G must increase by at least 4/2 as х varies from хо to xı. Thus 


ДЕ, б) > ib Fœ) - GP F(x) + б'(х) gy 
а\ 4p 
2(2) 572%. 


Let Xj, Xo ^, Xm be a sample from F, and Y, Y» «+, Y, be an inde- 


2 as п —› со, 
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pendent sample from G, where F, Ge. We wish to find the UMVUE of 
ДЕ, С). We first show that aise 


g(FG)- P{{max Q6, X) < min : CN Y?) U [max (Y;, У) 
< min (Xi, X7)]) 
(6) = f PNF, с). 
We have 


(Е, G) = P{max (Xy, X?) < min (У, ¥2)} 
+ P{max (Yj, Y?) < min (Xj, X2)} 


and i 
1 P(max (Xy, X3) < x} = F*(x), 
P(min (Y, Y) > y) = t1 — GOX. 
Let us write Р(х) = Ех) and Су) = G(y). Then 


str 6) = f^ t - GOJ FiO) ay + fü - ғо? ао) ax 
= ftt 6) = 260) FQ) FON dy 
+ We [1 + F(x) — 2FG9] GG) б'(х)] dx 
=2+ ge 2[G*(x) F(x) F'(x) + Ех) G(x) G(x) 
і m 2Е(х)б(х). (FQ) + G'G))] dx 
cec {7 arto» oy 

-4 T. F(x) G(x) [Е (0) +,6'(х)] dx 

v "t (LFG) + G(x)P - EF) — G(x} 
| F(x) + во] as 

"hind e [59+ Se [2660 dx + 24F, 6) 


=3- 5 4 F, б), 


which is (6). 
To use Theorem 2, let us define 
1 if max (X, X3) < min (Y;, Yz) or 


@) Ф, X», Yn Y2) = | pmax (Yj, №) < min (%, X2), 
0 otherwise. 
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Then y(X;, X», Yı, Yo) is an unbiased estimate of g(F, G), and indeed it is 
a kernel ‘of g(F, G). The corresponding U-statistic therefore must be the 
UMVUE. We have 


F i 
8) vx. Y) = (2) (2) „5 E, #% Xo Yow Y. 
so that U is the UMVUE of g(F, С), and the UMVUE of A(F, G) is 
(9) AX, Y) = 1 U(X, Y).- 4. 


Example'9: (Lehmann [68], 3-24, and Fraser [33], 162). Let 2 be the 
class of all absolutely continuous df's on the’ real line, and let Xj, 
X» o, X, and Y, Yo +++, Y, be independent random samples from F and 
G, respectively, F, Ge 2. We wish to estimate 

oF, G) = P(X < Y). 


Let us define 


for each pair X; Yj; i = 1,2, mj = 2, +, п. 

Then X”, Zis the number of X/s < У and 217, Zy is the number of 
Y's > X,. Mann and Whitney [78] suggest the use of the estimate U/mn, 
where 


U= у 


т=1 j=1 
and 
ЕШ = mn EZ; = mn P(X < Y)- 
Thus 


X cor. ti 
KE, 6) = m 


is-unbiased for. p. Moreover, ф is symmetric in the Х ’s and Y’s, so that it 
has minimum variance.'To:compute the minimum: variance, we have 

КЕШ? =D D D'E Е(2,,2), 
where cig ea fadi : 
ZijZu = F (чИ acm 
So that .2M34H0$ 
E(ZZu) = РХ < Ү Ху < Y 
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IL G'(x) dx IRCE Жу. Zok) 
fa MS Fe) GERE Wy ek, 

" [ro G'(x) dx 15 №, ј = К, 
(fre G'(x) dxf ifizhjsk. 


There: аге mn terms with i = h, = К; m(m — 1)п terms with i з h, 
j= k;mn(n — 1) terms with i = h, j з k;andm(m — 1)n(n — 1)terms with 
ix h, j #k. It follows that j 


EU? = mn f F(x) G'(x) dx + mn(n — 1) f [1 — G(x)? Р(х) dx 
+ m(m — 1)n f Ех) G'(x) dx 
+ тї(т— 1) (n — Def F(x) G'(x) ах, 


which leads to the variance of U. In particular, if F = G, then 


mn(m + n3: 1) 


var U = Tp Ji 


PROBLEMS 13.2 


1. Let (@, 9, P,) bea probability space, and let 7 = (P5: 0 € Ө}. Let A be a Borel 
subset of 2, and consider the parameter d(0) = P(A). Is d estimable? If so, what 
is the degree? Find the UMVUE for d, based on. a sample of size n, assuming that 
2 is the class of all continuous distributions. 


2. Let X, Xa =, X, and Y, Yz, --:, Y, be independent random samples from 
two absolutely continuous dfs. Find the UMVUE's of (a) E(XY) and (b) 
var (X + Y). 


3. Let (X, №), (Xa 05), ^ (X,; Y;) be a random sample fro an"absolütely 
continuous distribution. Find the UMVUE's of (a) E(XY) and (b) var (X + Y) 


4. Let T(X, X, =, Х) beia statistic that is symmetric in the observations. 
Show that T can be written as a function of the order statistic. Conversely, if 
T(X, X», ---, X,) can be written as a function of the order statistic, T is symmetric 
in the observations. | 


13.3 SOME SINGLE-SAMPLE PROBLEMS 


In this section we consider some tests of nonparametric hypotheses when a 
sample of л iid observations Xy, X», ---, X, is available. 
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The Problem of Fit 


The problem of fit is to test the hypothesis that the sample of observations 
Ху X», ---, X, is from some specified distribution against the alternative that 
it is from some other distribution. Thus the null hypothesis Ho: X; ~ Fo is 
simple and is to be tested against the composite alternative Hy: X;~ F; 
where F(x) # Fo(x) for some x. In Section 10.3 we studied the chi-square 
test of goodness of fit of Hp. 

Another test used for testing Hp is the Kolmogorov-Smirnov test. In Section 
7.3 we defined the sample (empirical) df F* of Xy, X», = X. 


Definition 1. Let Ху, X2 ++, X, be a sample from a df F, and let F* be the 
corresponding empirical df. The statistic 


(1) dD, = sup|Fz G9) = Fx) 

is called the (two-sided) Kolmogorov-Smirnov statistic: We write 
Q) i; = sup[FzG) — FO) 

and 

@) „ = sup (F(x) = Fx) 


and call D}, D, the one-sided Kolmogorov-Smirnov statistics. 


Theorem 1. The statistics Dn, Dy» D; are distribution free for any continu- 
ous df Р. 


Proof. Clearly, D, = max (D;,D,) Let Xo S Yo S ^ S Xim be the 
order statistics of Xj, Xo) >, Хо and define Xo) = —%®, Xos» = +O. 
Then 


FR(x) = T for Xj) <x< Xe i- 0, 1, 2, s+ т 


and we have 


Di = max sup Iz = Fo) 


0sizn Хеу<х<Хе 51) 


= max G ES inf F} 


Он ҮП  Xay$*<Xť+D 


= так - F(X«)) 


Osisn 


= max [max | 7 - кх) 9. 


Since F(X;) is the ith-order statistic of a sample from U(0, 1) irrespective of 


what F is, as long as it is continuous, we see that the distribution of D} is 
independent of F. Similarly, А 


D; = max{ max) Fat) — ро), 


and the result follows. 


Without loss of generality, therefore, we assume that F is the df of a 
U(0, 1) rv. 


Theorem 2. If F is continuous, then 
0 if v <0, 


(1/2n) fv--(3/2n) "H(2n—1)/2n1 
-f fin ш» s м, 
(1/2n)—wd (3/2n)—v [(2n—1)/2n1—v 


КО) вір, <v+ yh = AL du 
M if0<v< 2л -1 
iy ee 
1 iv Sh. T, 
where 
fm O-2u-c-zu,cl, 
9 Sluis tors) = (^ Cuber ise, 


is the joint pdf of an order statistic for a sample of size n from U(0, 1). 


_ We will not prove this result here and instead refer to Gibbons [36], page 
7]. Let D,,q be the upper a-percent point of the distribution of D,, that is, 
P(D, > D,,q} < а. The exact distribution of D, for selected values of n and 
a has been tabulated by Miller [81], Owen [88], and Birnbaum [8]. The large- _ 
sample distribution of D, was derived by Kolmogorov [60], and. we state it 

without proof. : ; j 


Theorem 3. Let F be any continuous df. Then for every z > 0 i 
(6) lim P(D, < zn} = Щз), 
where 


(7) цд =1- 2X Ci ecu. 
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Theorem 3 can be used to find d, such that lim,_.. P(4/n D, < dz} = 
1 — a. Tables of d, for various values of а are also available in Owen [88]. 

The statistics D^ and D; have the same distribution because of symmetry, 
and their common distribution is given by the following theorem. 


Theorem 4. Let F be a continuous df. Then 


0 1250, 
1 f f ( n d 
9 P{D} <z} = I Сина Јн) ит ч бы» Mn) Д ан 
if0<z<1, 
1 ifz z 1, 


where f is given by (5). 
Proof. We leave the reader to prove Theorem 4. 


Tables for the critical values Dfe where P(D; > Dha} < а, are also 
available for selected values of n and a; see Birnbaum and Tingey [7]. Table 
7 раве 661 gives Dy, and D,,, for some selected values of n and a. For 
large samples Smirnov [121] showed that 


= 2 
(9) lim Р{/л р} < 2) = 1-е“, 220. 
In fact, in view of (9), the statistic V, = 4n D; 2 hasa limiting “2 distribu- 
tion, for 4n Di? < 42: if and only if 4/n D; « z, z > 0, and the result fol- 
lows since | 
lim P(V,< 42) a T 220, 


so that 
lim P(Y, <) = 1-е" x20 
по 
which is the df of a (2) rv. 
Example 1. Let @ = .01, and let us approximate Dja We have СУП 


= 9.21) Thus V, = 9.21, yielding 
iva Set oo 
по = án 24/0 
If, for oxample, 2 9, then Р — 3.0316 = «50: Of course, the approx- 
imation is better for large л. m 
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The statistic D, and its one-sided analogues can be used in testing Hp: 
X ~ Fy against Ну: X ~ F, where Ах) # F(x) for some x. 


Definition 2. To test Ho: F(x) = F(x) for all x at level a, the Kolmogorov- 
Smirnov test rejects Hp if D, > Р, „. Similarly, it rejects F(x) > Fo(x) for 
all x if D; > ру, and rejects F(x) < Р(х) for all x at level a if D} > Diu 


For large samples we can approximate by using Theorem 3 or (9) to ob- 
tain an approximate а level test. 
Example 2. А die is rolled 15 times with the following results: 


Face value: 1 2 3 4 5 6 
Frequency: 0 1 4 0 4 6 


Let us test Ho that the die is fair, that is, test Hy: P{X = x} =}, 
x = 1, 2, ---, 6, against all alternatives. We have the following: 


x2 «1 1 2 3 4 5 6 >6 
Fix): ӨЛ жал M gati без! 1 1 
F(x): 9 аа wu GELD ЖМ 1 
JFEG) — FG]: 0 Да оо уоту 0 0 


Dy, = sup |Р) — Ро) = р = 333: 


Since Dis, 10 = «304, we reject Ho at level a = .10. 
Let us now apply the chi-square test to the same data, noting that the 
number of observations is not large enough, We have the following : 


Face value: 1 2 3 4 5 6 

Ж 0 1 4 0 4 6 

пр: no ee Wu y 
where х = (xj, +++, X6), x; is the number of times i shows up, i = 1,2, +++, 6, 
and p = (4, $, + 1). Let us combine the first two classes, the third and 
fourth, and the last two to make the expected frequency in each class at 
least 5. Then U —8.4. Since XÀao = 4.6, we reject Hy at level а = .10. We 


emphasize that the application of the chi-square test to this example is 
somewhat dubious. 


Example 3. Let us consider the data in Example 10.3.3, and apply the 
Kolmogorov-Smirnov test to determine the goodness of the fit. Rearranging 
the data in increasing order of magnitude, we have the following result: 


— 


| 
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x Ех) Ех) |FS(x) — Е) 
= 1:787 .0367 4 10133 
— 1.229 .1093 $ .0093 
—0.525 2998 4 11498 
— 0.513 .3050 5 .1050 
— 0.508 .3050 $ .0550 
— 0.486 3121 4 0121 
—0.482 3156 4 .0344 
— 0.323 .3745 $ .0255 
— 0,261 .3974 4 .0526 
— 0.068 .4721 0 .0279 
—0.057 .4761 y .0739 
137 5557 Ut 0443 
-464 -6772 B .0272 
.595 +7257 И .0257 
.881 8106 іН .0606 
.906 -8186 i .0136 
1.046 .8531 [4 .0031 
1.237 .8925 s .0075 
1.678 .9535 ¥ .0035 
2.455 .9931 1 0069 


Юж = sup |F(x) — Fo(x)| = -1498. 


Let us take а = .05. Then Dy, 05 = .294. Since .1498 < ,294, we accept 
Hy at the .05 level of significance. 


It is worthwhile to compare the chi-square test of goodness of fit and the 
Kolmogorov-Smirnov test. The latter treats individual observations directly, 
whereas the former discretizes the data and sometimes loses information 
through grouping. Moreover, the Kolmogorov-Smirnov test is applicable even 
in the case of very small samples, but the chi-square test is essentially for 
large samples. 

The chi-square test, on the other hand, can be easily modified to allow 
estimation of parameters from the data, but the Kolmogorov-Smirnov test 
does not have this flexibility. The chi-square test can be applied when the data 
are discrete or continuous, but the Kolmogorov-Smirnov test assumes con- 
tinuity of the df. This means that the latter test provides a more refined an- 
alysis of the data. If the distribution is actually discontinuous, the Kolmogorov- 
Smirnov test is conservative in that it tends to accept Hp. 

We next turn our attention to some other uses of the Kolmogorov-Smirnov 
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statistic. Let Ху, X», +++, X, be a sample from a df Р, and let F* be the sample 
df. The estimate F* of F for large n should be close to F. Indéed, 


A4/ F(x) (1 — FC 1 
йоу арро к s FOOD s Fe a чш, 


and, since F(x)[l — F(x)] < 4, we have 
Sipe À 1 
an P|IFze9 = Р(х) s э теј 


Thus F* can be made close to F with high probability by choosing А and 
large enough л. The Kolmogorov-Smirnov statistic enables us to determine 
the smallest. such that the error in estimate never exceeds a fixed value = 
with a large probability 1 — а. Since 
(12) PiD,se)zl1-a, 
ё = D,,,: and, given e and а, we can read n from the tables. For large n, we 
can use the asymptotic distribution of D, and solve d, = e4/n for n. 

We can also form confidence bounds for F, Given a and n, we first find 
Р”, such that 


(13) ) Рр. > Dic) $15 
which is the sàme as 
P{sup 0х) F(x)| € Бу} 21 — a: 
x 


Thus Т 

(14) / PUF A(x) — F(x) |, D,;, Tor all x} 21 а. 
Define 25 be 

a5 j see gai (FiG)- D,a 0) 

and келде ^ X 

(16) і сәри min TEMO AD; 1): 


Then the region between L,(x) and U,(x) can be used as a confidence band 
for F(x) with associa’ Ч confidence Coefficient 1 — a. 


Example 4. For the data on the standard normal distribution of Example 
3, let us form a .90 confidence band for the df. We have О 260. Тре 
confidence band is, therefore; FX) + .265 as long as the band i is between 
0 and 1. : 

I g 29n 


The Problem of Location ^ ** 


Let X, X2, ---, X, be а sample of size п from some unknown df F. Let p 
be a positive real number, 0 « p « 1, and let &,(F) denote the quantile of 
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order p for the df F. In the following analysis we assume that Fis absolutely 
continuous. The problem of location is to test Ho: «,(F) = ко ко a given 
number, against one of the alternatives x,(F) > xo, к, < Ko, and x, # ко. The 
problem of location and symmetry is to test Hy: xo.5(F) = xo, and Fis sym- 
metric against Hy: o,5(F) # ко or F is not symmetric, 


The Sign Test. Let Xj, X» ---, X, be iid rv's with common pdf f. The 
problem is to test 


(17) Ho: kf) = ко against Hy: &(f) > ко, 
where «,(f) is the quantile of order p for f, 0 <р < l. Let (Xi, X», ---, Xp) 
be the number of positive elements іп X, — xo, X2 — ko с, X, — Ko Note 


that P(X; = ко) = 0, since X; are continuous-type rv's. It can be shown (see 
Fraser [33], 167-170) that a UMP test of Hp against H; is given by 


рб) > с, 


(18) ф(х Xo i Xn) = d (ху, Xo, 7 Xn) = с, 
0, хрх SX) < 6, 


where c and 7 are chosen from the size restriction 


oy Eger mm tier ne 
with q = 1 — p. This is so because, under Ho, K(f) = къ So that P(X € ко} 
= p, P(X > ko} = 1 — р = q, and (X) has a b(n, 4) distribution. 

The same test is UMP for Ho: «,(f) € ко against «,(f) > ко. For the 
hypothesis Ho: к,(/) = ко against the two-sided alternative «,(f) # ко, the 
two-sided sign test can be shown to be UMP unbiased. (See Fraser [33], 171.) 


Example 5. Entering college freshmen have taken a particular high school 
achievement test for many years, and the upper quartile (p = .75) is well 
established at a score of 195. A particular high school sent 12 of its gradu- 
ates to college, where they took the examination and obtained scores of 
203, 168, 187, 235, 197, 163, 214, 233, 179, 185, 197, 216. Let us test the 
null hypothesis Ho that к. < 195 against Hi: клу > 195 at the а = .05 
level. 
We have to find с and 7 such that 


LAW «0097-5 
From the table of cumulative binomial distribution (Table 1,-page 648) for 
п = 12, p = 4, we see that c = 6. Then 7 is given by, 


o (OGGI 
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2.0358 — 
^ Tz 70402 = .89. 
In our case the number of positive signs, x, — 195, i = 1, 2, --., 12, is 7, 


so that we reject Hy that the upper quartile is < 195. 


Example 6. A random sample of size 8 is taken from a normal population 
with mean 0 and variance 1. The sample values are —.465, .120, —.238, 
—.869, —1.016, .417, .056, .561. Let us test hypothesis Ho: 4 --10 
against Hj: u > —1.0. We should expect to reject Hy since we know that 
it is false. The number of observations, x; — до = х; + 1.0, that are > 0 
is 7. We have to find c and 7 such that ` 


X. 3) «x = os m. 


m 
Ф 


that is, 
$ (8 B LARS 
2.0 “hit C is. 
We see that c = 6 and 7 = .13. Since the number of positive x; — pọ is 
> 6, we reject Hy. 
Let us now apply the parametric test here. We have 


Since o = 1, we reject Hy if 


t> fy + Jon = -10 + Sm 


Since —.179 > —.42, we reject Hy: 


‘The single-sample sign test described above can be easily modified to apply 
to sarmpting from a bivariate population. Let (X;, Y;), (№, Үз), (X, Y,) 
be а rattdom sample from a bivariate, population. Let Z; = X, — Y, i= 
1,2, ^, n, and assume that 2; has an absolute! у continuous df. Then one 
. сап test hypotheses concerning the osder parameters of Z by using the sign 
‘test. A hypothesis bf interest here is that Z has a given median mo = K172): 
Without loss of generality let mp) = 0. Then Ну: med (2) = 0, that is, 
| P{Z > 0) = P(Z < 0) = ү Моге that med (Z) is not necessarily equal to 
mad (X) — med (Y), so that His not that med (X) = med (Y) but that 
ined (Z) = 0. The sign test is UMP against one-sided alternatives and UMP 
rifiased against two-sided alternatives. - 
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Example 7. We consider an example due to Hahn and Nelson [43], in 
which two measuring devices take readings on each of 10 test units. Let 
X and Y, respectively, be the readings on a test unit by the first and 
second measuring devices. Let Y — 4 +f, Y= A+ 7, where А, C, h 
respectively, are the contributions to the readings due to the test unit 
and to the first and the second measuring devices. Let A, C, 7 be independent 
with EA = y, var (A) = 0°, EC = Ey = 0, var (C) = ø}, var (7) = os so that 
X and Y have common mean д and variances g? + a, and о; + c?, respec- 
tively. Also, the covariance between ¥ and Y is 2;. The data are as follows: 


Test unit: 1 2179 4 5 67 8 9 10 
First device, X: 71 108 72 140 61 97 90 127 101 114 
Second device, Y: 77 105 71 152 88 117 93 130 112 105. 
Z=X=Y7: OR IE p8*—12:—20.23 23/8 9 


Let us test the hypothesis Ну; med (2) = 0. The number of Z/s > 0 is 3, 
3 
P{Number of 2/5 > 0 is < 3|Ho} = Z (0) 


We have ў 
A 10 
172; (2) 


Using the two-sided sign test, we have to accept Hp at lével a = .05, since 
-172 > .025. The rv's Z; can be considered to be distributed normally, so 
that under Но the соттоп mean of 275 is 0. Using a paired comparison 
t-test on the data, we can show that t — —.88 for 9 d.f., so that we accept 
the hypothesis of equality of means of X and Y at level а = .05. 
The Wilcoxon Signed-Ranks Test. The sign test loses information since 
it ignores the magnitude of the difference between the observations and the 
hypothesized quantile, The Wilcoxon signed-ranks test provides an alterna- 
tive test of location (and symmetry) that also takes into accbunt the magni- 
tudes of these differences. ; 
Let Xj, Xp, +++, X, be iid rv's with common absolutely continuous df F, 
which is symmetric about the median т. The problem is to test Hy: m = my 
against the usual one or two-sided alternatives. Without loss of generality, 
we assume that mọ = 0. Then F(— x) = 1 — F(x) for all xe 2. To test Ho: 
F(0) = } or m—0, we first arrange | X; |, | Х|, ---, |X| in increasing order of 
magnitude, and assign ranks 1, 2, ·-:, п, keeping track of the original signs . 
of X,. For example, if п = 4 and |X;| < |X| < [X] < | Xs], the rank of 
[Х\| 3, of | X;| is 1, of | Х| is 4, and of | Xi | is 2. 


Let 
{ | [Т* = the sum of the ranks of positive X/s, 
(20) (he aed аш шайы нз. 


| 
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Then, under Ну, we expect T* and Т to be the same. Note that 
(21) | Tt+T = Éis PY, 


зо that T* and Т are linearly related and offer equivalent criteria. Let us 
define 


(22) z-ü vitet ed АЫ 
and write r(|X,|) =r, for the rank of |X;l. Then T^ = Di- r;Z; and 
T = X40 - Zjr. Also, 
| ИР == Bt 2È Zr 
23) =2 > nZ,- IT D. 


The statistic T* (or T^) is known as the Wilcoxon statistic. A large value of 
Т* (or, equivalently, a small value of Т^) means that most of the large devi- 
ations from 0 are positive, and therefore we reject Hp in favor of the alter- * 
native, Hy: m > 0. 

A similar analysis applies to the other two alternatives. We record the 
results as follows: 


" Test 
Hast enti brio Reject Hy if 
т = 0 m0 T's 
m=0 m«o fice 
m=0 m#0 T. «corT >с 


Let us next compute the distribution of T `. Let 


1 if the | ¥;| that has rank i is > 0, 


(24) Zac lo otherwise. 


Clearly Т = E} iZq;. The rv's Zi; are independent. Bernoulli rv's but are 
not necessarily identically distributed. We have 


Е2 = P(Zi; = 1) = P([r(X;]) = i, X; > 0] for some jj 
= P{ith-order statistic in | Xi |, ---, | X, | corresponds to a positive Х;) 


= fon Eo 0 – Аа о) de 
бу = a(r i) fy tre - & - or D - кш) + FC шү “ш 
where f is the pdf of X. Moreover, 
(26) vat (Zo) = E(Zu(1 — EZ) 
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and 
(27) cov (Zi), Zi) = 0, is j. 


Under Ho, X is symmetric about 0 so that F(0) = 1, F(— u) = 1 — F(u) 
for all u > 0. Thus 


(28) ELS n(? i 3i T^ [2Е(и) — 1^ p — 2Е(ш)]' "f du. 
Letting v = 2F(u)— 1, we have 
n (пур ica yid 
Борбата ( d: f" (my qs 


m п (п sen ren 


2 
Q9) =3. 
The general moments of T* are given by 
(30) ЕТ“ = EZ, 
£a 
and 
GD ME var (T) = E P (EZo(1  EZ,)), 
[ES] 
so that 
= ПАРЫН + 1) 

(32) Е, (Т) > x LE: Др"! 
and 

5 о 1) (21 +1 
(33) пат, 7) = bis me tach), 


For large л, the Lindeberg condition is satisfied (prove it), so that 
Ti = Ej,T* [А 

v [n(n +1)(2п+1)]/24 

where Z is .1 (0, 1). ЖИМ 

We next compute the distribution of T^ for small samples. The distribution 
of T* is tabulated by Kraft and Van Eeden [62], pages 221-223. 

Note that T* = 0 if all differences have negative signs, and T* = n(n + 1)/2 
if all differences have positive signs. Here a difference means a difference 
between the observations and the postulated value of the median. T* is 
completely determined by the indicators Zg, so that the sample space can 
be considered. as a set of 2" n-tuples (zy 22, +++; 2,); where each z; is 0 or 1. 
Under Ho, m = то and each arrangement is equally likely. Thus 


2, 


(34) 
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{number of ways to assign + or — signs to 
PAIT ='t} & integers 1, 2, m that the sum is £} 


(35) = =. say. 


Note that every assignment has a conjugate assignment with plus and 
minus signs interchanged, so that for this conjugate T* is given by 


(36) Bid = 20) = FY ід. 
Thus under Ap the distribution of T* is symmetric about the mean n(n+ 1)/4. 


Example 8. Let us compute the null distribution for n = 3: Ер,“ = 
‘n(n + 1)/4 = 3, and T* takes values from 0 to n(n, + 1)/2 = 6: 


Ranks Associated with : 
Value of T* Positive Differences (t) 
6 1,2,3 1 
5 23. 1 
4 1,3 1 
3 12:3. 2 


so that 
А bo 4= 4,5, 6,0,1, 2, 
0, otherwise. 
Similarly, for п = 4, one can show that 
& t= 0, 1, 2, 8, 9, 10, 
(38) Py{T' == )% 1=3,4,5,6,7, 
í 0, otherwise. 


An alternative procedure would be to use the mgf technique. Under Ho, 
the rv's iZ are independent and have the pmf 


P{iZq = i) = P(iZu = 0) = 4: 


Thus 
M(t) = Ee'T* 
2p5B(e x1 
e» =1( 2 ) 


We express M(t) as a sum of terms of the form oje^[2". The-pmf of T* can 
then be determined by inspection. For example, in the case n — 4, we have 
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wo - 10) (63066) 


3t At 
e. 3 2t n е +1\/е'+1 
(40) zd (ehe er +e d. 
(41) =H + He +2“ ее + (+ i 
(42) ‚= BC +e e + 2е'' + 258 + De + 2e" 289 + е“ + ef + 1), 


| This method gives us the pmf of T* for n = 2, n = 3, and n= 4 immediately. 
Quite simply, 
(43) Py,{T* = j} = coefficient of e” in the expansion of M(t), j = 0, 
1, <- n(n + 1)/2. 


; 
E Example 9. Let us return to the data of Example 6 and test Ho: m = 
| u=- 1.0 against Н: т > — 1.0. Ranking |x; — m| in increasing order 
of magnitude, we have 

L 


:016 < .131 < .535 « :762 < 1.056 < 1.120 < 1.417 < 1:561 
5 4 1 3 7 2 6 8 


Thus 


= 3) т = 6, г = 4, nm = 2, 7g = 1, њ = 7, 


апа 
Т+=3+6+4+2+7+5 + 8 = 35. 


From Table 10 on page 665, Hp is rejected at levela = .05 if T* > 31. Since 
35 > 31, we reject Ho. 


Remark 1. ‘The Wilcoxon test statistic can also be used to test for symmetry. 
Let X, Xp, +++, X, be iid observations on an rv with absolutely continuous 
df F. We set the null hypothesis as 


Ну: median, m = то, and df F is symmetric about mo. 
The alternative is 
Hj: m # mg and F symmetric, or F asymmetric. 
The test is the same since the null distribution of T* is the same. 
Remark 2. If we have n independent pairs of observations(X;, Y1), (Xz, Y2), 
+, (X, Y,) from a bivariate df, we form the differences Z; = X, — Yp 


i= 1,2, .-,n. Assuming that 2, Zo --,Z, are (independent) observa- 
tions from a population of differences with absolutely continuous df F that 
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- is symmetric with median т, we can use the Wilcoxon statistic to test Hy: 
m= то. 


We present some examples. 


Example 10. For the data of Example 10.3.3 let us apply the Wilcoxon 
statistic to test Ну: т = 0 and F is symmetric against Н: m # 0 and F 
symmetric or F not symmetric. 

The absolute values, when arranged in increasing order of magnitude, аге 
as follows: 


.057 < .068 < .137 < .261 < .323 < .464 < .482 < .486 < .508 <.513 
13 5 2 17 4 1 п 15 20 7 


«.525 < 595 < .881 «.906— 1.046 < 1.229 < 1.237 < 1.678 < 1.787 < 2.455 
8 9 10 6 19 14 18 12 16 3 
Thus 
nr, = 6, = 3; rs = 20, Ti m5, 75 = 2,: rg = 14, 
т 10, = 11, Sty — 12; по = 13, т = 7,  rij-18, 
тз = 1, т = 16, m5 = 8, т = 19, у= 4 пв = 17, 
„по = 15, т = 9 
and 


Т*+=6+3 + 20 + 14 + 12 + 13 + 18 + 17 + 15 = 118. 
Since n = 20, Ej, T* = 20 x 21/4 = 105, 
vary, (Tt) 20-21-41 717:5; 


Thus 
Tic Big Bits апо 118441054 601321 
М vary, (T^) 26.8 26.8* 


and P(Z > 13/26.6} = P{ Z > .5} = .3085. We therefore accept the null 
hypothesis. 


Example 11, Returning to the data of Example 7, we apply the Wilcoxon 
test to the differences Z; = X, — Y; The differences are —6,3, 1, — 8, 
—17, —20, —3, —3, — 11,9. To test Ну: m = 0 against Ay: m # 0, we rank 
the absolute values of z; in increasing order to get 


11<3=3=3<6<8<9<11 <17 «20 


and 
T'-142247 -10. 
Here we have assigned ranks 2, 3, 4 to observations +3, — 3, — 3. (If we 


assign rank 4 to observation 3, then T* — 12 without appreciably changing 
the result.) 


A 
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From the tables, we reject Ho at a = .05 if either T^ > 46 or T' < 9. 
Since T* > 5 and < 50, we accept Но. Note that hypothesis Hp was also 
accepted by the sign test. 


PROBLEMS 13.3 


1. Prove Theorem 4. 

2. Test the goodness of fit for the data of Problem 10.3.5, using the Kolmogorov- 
Smirnov test. i 

3. Test the goodness of fit for the data of Problem 10.3.6, using the Kolmogorov- 
Smirnov test. 

4. For the data cf Problem 3 find a .95 level confidence band for the distribution 
function. 


5. The following data represent a sample of size 20 from U[0, 1]: .277, .435, .130, 
.143, .853, .889, .294, .697, .940, .648, .324, .482, .540, .152, .477, .667, .741, 
.882, .885, .740. Construct a .90 level confidence band for F(x). 


6. In Problem 5 test the hypothesis that the distribution is U[0,1]. Take a = .05. 
7. For the data of Example 3 test, by means of the sign test, the null hypothesis 
Hy: и = 1.5 against Hy: и # 1.5. 3 

8. For the data of Problem 5 test the hypothesis that the quantile of order 
р = .20 is .20. 

9. For the data of Problem 10.4.8 use the sign test to test the hypothesis of no 
difference between the two averages. i 
10. Use the sign test for the data of Problem 10.4.9 to test the hypothesis of no 
difference in grade-point averages. 

11. For the data of Problem 5 apply the signed-rank test to test Ho: m = .5 
against H,: т x .5. 

12. For the data of Problems 10.4.8 and 10.4.9 apply the signed-rank test to the 
differences to test Hy: т = 0 against H,: m # 0. M$ 


13.4 SOME TWO-SAMPLE PROBLEMS 


In this section we consider some two-sample tests. Let Xj, X, --, Xm and 
Yı» Yq +++, Y, be independent samples from two absolutely continuous dis- 
tribution functions Fy and Fy, réspectively. The problem is to test the null 
hypothesis Ho: Fy(x) = Fy(x) for all x e 2 against the usual one-sided and · 
two-sided alternatives. First note that Fy > Fy means 


P{X < x} > P{Y < x} 
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so that 
(1) P(X» x} < P{Y> x}. 


Thus an alternative Fy > Fy means that the rv Y tends to be larger than the 
УД, ы 


Definition 1. We say that a continuous rv Y is stochastically larger than a 3 
continuous rv X if inequality (1) is satisfied. 


‚ A similar interpretation may be given to the one-sided alternative Fy < Fy. | 
In the special case where both X and Y are normal rv's with means His a Б 
and common variance g^, Fy = Fy corresponds to шу = д and Fy > Fy _ 
corresponds to ш < yz х 

Some of the tests developed for single samples may be modified to test the — 
null hypothesis Ho: Fy = Fy. 


The Chi-Square Test for Homogeneity 


` Since the analysis is quite general, let us consider the case of sampling _ 
from p populations, p > 2. We wish to test Hp that they all have the same — 
distribution. Let Aj, Az ---, A, be a partition of the real line, where the Ajs _ 
. are Borel sets, and let - 


(2) P(XeA)-p5, і= 1,2, --, к; ј = 1,2, p 
If X, X», +, X, all have the same df, then 
PA = Pi = руу o bi-12,Kk. 


This is exactly the problem of testing the equality of р independent multi- — 
nomial distributions. Let т, лә, ---, п, be the number of observations on Xj, 
X» ++, Xp respectively. If nj, п, +-+, n, are large, the rv 


> | (Ху — и | 

o Я У. P Al nppi 

is the. sum of p independent rv's each of which is asymptotically a chi-square 
Iv with k — 1 d.f. Here X;, is the number of observations on X; that lie in 
the set A;. Thus the sum V is a chi-square rv with p(k — 1) d.f., provided —— 
that л, nz, ---, n, are sufficiently large. In general, Pis will not be known, 
and we will have to estimate them from the data. Under Hy: pa = Pi NS 
Utm Pip d = 1, 2, +++, k, and the MLE’s for the common probability are A 


0) 
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We need to estimate only k — 1 of these probabilities, since the last сап be 
estimated from the sum 1. Thus the rv , 
> ORD(XG— np 

$ 5[“ 7 up) ] 


т: тур; 


is approximately x with p(k — 1) — (k — 1) = (p — D) (k — 1) d.f. We reject 
Hy at leyel а if the computed value of у? is > y^, 4 3e 


Example 1. Four samples are taken from a table of random normal 
numbers (Table 6, page 659) with д = 0 and g = 1. 


Sample 1 1.221, — 0.439, 1.291, 0.541, —1.661, 0.665, 
Sample 2 1.119, —0.792, 0063, 0.484, 1.045, 0.084, 
Sample 3 0.004, —1.275, —1.793, —0.986, — 1.363, — 0.880, 
Sample 4 —2.015, — 0.623, —0.699, 0.481, — 0.586, — 0.579, 
Sample 1 0.340, 0.008, 0.110, 1.297, —0.556, — 1.181, 
Sample 2 —0.086, 0.427, —0.528, —1.433, 2.923, — 1.190, 
Sample 3 .—0.158, 0.831, —0.813, —1.345, 0.500, — 0.318, 
Sample4 -0.120, 0.191, —0.071, —3.001, 0.359, — 0.094, 
Sample] .—0.518, 0.843, 0.584, —0.431, —0.135, — 0,732, 
Sample 2 0.192, 0.942, 1.216, 1.703, —0.145, — 0.066, 
Sample 3 —0.432, 1.045, 0.733, 1.164, —0.498, 1.006, 
Sample 4 1.501, 0.031, 0.402, 0.884, 0.457, —0.798, 
Sample 1 1.049, 2.040, 0.116, — 1.616, . 1.381, — 0.394. 


Sample 2 1.810, — 0.124. 
Sample 3 2,885, 0.196, —1.272, 1.262, — 0.281, 1.707, 


Sample 4 — 0.768: · 0.023. 
Sample 3 0.580. ji 
Here n, = 24, | m = 20,.. -n = 25, - m = 20. Let 
Ay = (—, = 75) 4 = (-.75,0}, 4s = (0,75) and 


А, = (.75, +оо). 
The observed frequencies X;; are as follows: 
X; 
Ud Xn Xi Хз Xa 
d 3 3 9 4 
2 7 5 5 6 
3 7 5 5 8 
4 TES 7 6 2 
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The following table gives the probabilities p;; j = 1, 2, 3, 4: 
i: 1 2 3 4 
Pij: .2266 2734 .2734 .2266 
and the expected frequencies лур, are as follows: 
| пру 
ii, 1 2 3 4 


1 5.438 4.532 5.665 4.532 
2 6.562 5.468 §.835 5.468 
3 6.562 5.468 6.835 5.468 
4 5.438 4.532 5.665 4.532 


Thus 


(ху Et np) аи 
= ы 7.84. 
X ps E [re пур; i 


Since Ж = 16.9, we accept Ho. 

If we pool 4; with A, and A, with A, (because some class frequencies 
are « 5), we get v — 1.82. Since р = 7.81, we again accept Ho at the 
«05 level. i 
Example 2. Three samples are taken from U[0, 1]: 


Sample 1 .59, .96, .78, .16, .87, .13, .73, .41, .74, .09, .46, .56, 
Sample 2 25, .91, .26, .77, .55, .72, .47, .43, .31, .56, .50, .96, 
Sample 3 .22, .90, .78, .66, .18, .73, .43, .58, М, .16, .47, .07, 
Sample 1 .75, .10, .36, .68, .17, .52, .70, .70, .64, -61, .18, .64, .64. 
Sample 2 .87, .46, .26, .60, .59, .11, .02, .33, -66, .70, .32, .02,.01. 
Sample 3 .30, .40, .11, .72, .83, .09, .25, .77, "72, .33, .21, .53, .72. 


To test the null hypothesis that all the samples came from U[0, 1], let us 
choose 


Ai-[0..25) ^ Ap=[.25, 50), 45-[50,.75) ^ "A,-(.75, 1.00} 
Then the observed frequencies are as follows: 


Xj 
i Ха Xiz Хз 
1 6 4 8 
2 3 9 6 
3 12 8 7 
4 4 4 4 
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The expected frequencies are these: 


пру 
ilj 1 2 3 
1 6.25 6.25 6.25 
2 6.25 6.25 6.25 
3 6.25 6.25 6.25 
4 6.25 6.25 6.25 


Thus 
tat È Ў [a * 224 


j-ii-l пур; 
= 6.3877; 


Comparing this result with the tabulated value of chi-square for 6 d.f., we 
see that 6.3877 < x2 5 = 12.6, so that we cannot reject Hp. 
The Kolmogoroy-Smirnov Test 


Let Ху, Xz» +, X, and Y; Yo, --, Y, be independent random samples 
from continuous df's F and G, respectively. Let the order statistics be 


Ха» Ха» 5 Xem and. Yay Yos 5s Yo 
Let us write for the empirical df's 


0, Xx € Ха» 
(4) FRx) = iks; Xw S x < Xan» k 252, m- 1, 


1, k 2 Xo», 

and 
0, x € Yo, 

6 Gr) = £I Ya €x« Youn, k= 1,2, n— 1, 
1; x= Yar 


In a combined ordered arrangement of m X’s and п Y's, Ff and.G* represent 
the respective proportions of X and Y values that do not exceed x. Under 
Hy: F(x) = G(x) for all x, we expect a reasonable agreement between the 
two sample df’s. We define ` 


© Dy, = Sup |F) — G*x). 


Then D,,, may be used to test. Ho against the two-sided alternative Hy: 
F(x) з G(x) for some x. The test rejects Ho at level a if 
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(7 Dy = Раз 
where Py, {Dm,n = Dmna} < а. Р 

Similarly, опе can define the one-sided statistics 
G) ' Din = Sup [FR(x) — G5(x)] 
and j 
9) Dw = SUP [GE(x) — Р(х), 


to be used against the one-sided alternatives 


(10) G(x) < F(x) forall x арі © G(x) < F(x) for some x 
with rejection region р, > D} „a 
and 


(11) F(x) < G(x) for all x and F(x) < G(x) for some x 
with rejection region Dy, > D... a» respectively. . 


For small samples tables due to Massey [79] are available. In Table 9, 
_ page 663, we give the values of Dm, „,« and DU. for some selected values of 
m, n, and a. Table 8 gives the corresponding values for the case m — n. 
For large samples we use the limiting result due to Smirnov [120]. Let 
М = mn[(m + n). Then 


in i "ND* SS e, A> 0, 

(12) Jim P(V/NDz, $2) = fó: dpa 
(13) lim P(/ND,, < a} = |5. (Ле Л. 15.0, 
з 0, hy 250. 


Relations (12) and (13) give the distribution of D} „ and Dpp respectively, 
under Hy: F(x) = G(x) for all xe 2. 


Example 3. The following data represent the lifetimes (hours) of batteries 
for two different brands: P 


Brand A: 40, 30, 40, 45, 55, 30- 
Brand B: 50, 50, 45, 55, 60, 40 
Are these brands different with respect to average life? 
Let us first apply: the Kolmogorov-Smirnoy test to” test’ Hy -that the 
,, „Population distribution of length of life for the two brands is the same. ` 


hoy 
WE 
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x Е) Gto) |40) – G3) 
30 i 0 i 
40 t i i 
45 i i i 
50 i { i 
55 I i 1 
60 1 I 0 


Dee = sup Eœ) - С] = 2. . 


From Table 8, page 662, the critical value for m =n = 6 at level a = .05 
is Dé,6,.05 = $+ Since Ру, Dg,sos we. accept Ho that the population 
distribution for the length of life for the two brands is the same. 

Р Let us next apply the two-sample t-test. We have x — 40, y = 50, 
st = 90, 52 = 50, 5; = 70. Thus 


ees eee 
Уло УК ++ 


Since t1o,.925 = 2.2281, we accept the hypothesis that the two samples come 
from the same (normal) population. 


— 2.08. 


The Median Test 


A frequently used test, which is essentially a test of the equality of medians 
of two independent distributions, is the median test. This test will tend to 
accept Hy: F = G even if the shapes of F and G are different as long as the 
medians are the same. Д 

The test is simple. The combined sample Xi, Xo, 1, Xm Yi, Yo, s y. 
is ordered and a median is found. If т + л is odd, the median is the 
Km + n + 1)/2]th value in the ordered arrangement. If m + n is even, the 
median is any number between the two middle values. Let V be the number 
of observed values of X that are < the sample median for the combined 
sample. If V is large, it is reasonable to conclude that the actual median of 
X is smaller than the median of Y. One therefore rejects Ho: F = G in favor 
of Hy: F(x) > G(x) for all x and F(x) > G(x) for some x if V is too large, 
that is, if V > c. If, however, the alternative is F(x) < G(x) for all x and 
F(x) < G(x) for some x, the median test rejects Hy if V < c. 

For the two-sided alternative that F(x) # G(x) for some x, we use the 
two-sided test. > 

We next compute the null distribution of the rv V. Ifm+n= 2р, pa 
positive integer, then ; 

MEUS 


- 
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Рн{У = у} = Pg, (Exactly v of the X;’s аге < combined median} 


Scri xo 


otherwise. 


(14) 


Here 0 x V < min (m, p). If m + n = 2p + 1, p > 0, is an integer, the 
[(m + n + 1)/2]th value is the median in the combined sample, and 


Pu, V = у} = P(Exactly v of the X;’s are below the (p + 1)th value 
in the ordered arrangement} 


(15) =, mtn 
X 4 ) otherwise. 


(20, : ») v= 0, 1, ---, min (m, p), 


Remark 1. Under Hy we expect (m + n)/2 observations above the median 
and (m + n)/2 below the median. One can therefore apply the chi-square 
test with 1 d.f. to test Ho against the two-sided alternative. 


Example 4. For the data of Example 3 the combined sample in ordered 
form is as follows:30 < 30 < 40 < 40 < 40 < 45 < 45 < 50 < 50 < 55 < 
55-< 60. Since m = n = 6, m + n = 12is even and the median is 45. Thus 


v = number of observed values of X that are less than or equal to 45 
= 5. 


` Now 


Since Р. {И > 5) > .025, we accept Hy that the two samples come from 
the same population. 


The Mann-Whitney-Wilcoxon Test 


Let Xy, X, ~", X, and Yi, Yz, ---, Y, be independent samples from two 
continuous df's, F and G, respectively. We define 


1; ‚ | 
бб ж-{, уук Anam atm 
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and write 
(17) U= Ў È Zp 


Note that 37, Zij is the number of Y7's that are larger than X;, and hence 
U is the number of values of Х|, X», ---, X,, that are smaller than each of Y;, 
Үз, +, Y,. The statistic U is called the Mann-Whitney statistic and was first 


encountered in Section 13.2. 


Example 5. Let т = 4, п = 3, and suppose that the combined sample 
when ordered is as follows: 


XQ < Xy € Ya < Ya < Xa <I € X 
Then U = 7, since there are three values of x « yj, two values of x < yz, and 
two values of x « ys. 


Note that U = 0 if all the X/'s are larger than all the Y/sand U = mn if all 
the Xs are smaller than all the Y/'s, because then there are m X's < ү, m 
X's < Ys, and so on. Thus 0 < U < mn. If U is large, the values of Y tend 
to be larger than the values of X (Y is stochastically larger than X), and 
this supports the alternative F(x) > G(x) for all x and F(x) > G(x) for some 
x. Similarly, if U is small, the Y values tend to be smaller than the X values, 
and this supports the alternative F(x) < G(x) for all x and F(x) « G(x) 
for some x. We summarize these results as follows. ? 


Ho Н, Reject Ho if 
F=G FzG Ш> сү $ 
F=G ESG О< с 
Е= © ЕФ С U > cor U S6 
To compute the critical values we need the null distribution of U. Let 
(18) Dy, i) = Py {U = и). 


We will set up a difference equation relating Pm, tO Pm-1,n and Pj», ,-1- If the 
observations are arranged in increasing order of magnitude, the largest value 
can be either an x value or a y value. Under Hp, all m+n values are 
equally likely, so that the probability that the largest value will be an x value 
is m/(m + n) and that it will’be a y value is n/(m + n). 

Now, if the largest value is an x, it does not contribute to U, and the 
remaining m — 1 values of x and n values of y can be arranged to give the 
observed value U = u with probability p,,-1,,(4). If the largest value is a Y, 
this value is larger than all the m x's. Thus, to get U = u, the remaining 
n — 1 values of Y and m values of x contribute U = и — m. It follows that 
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(19) Pac) o s paa) + eg a Pract 7 P. 
If m = 0, then form > 1 
1 dfu-0, 
"m \ ља) = (0 "us 
Ifn=0, т> 1, then 
1 ifu=0, 
en рам) = (5 низо 
апі 
(22) Puede) = 0 ifu <O m>0, п> 0. 


For small values of т and п опе can easily compute the pmf of U. Thus, 
ifm = n = 1, then 
РО) = + pO = 1- 
Ifm=1,n=2, then, 


5,40) = 5,40 = р.) = + 
Tables for critical values are available for small values of m and n,m < M- 
See; for example, Auble [4] or Mann and Whitney [78]. Table 11 on page 
666 gives the values, of и. for which Pu, {U > ua} < a for some selected 
values of т, n, and a. . 
For latge values of m and n, one can use the central limit theorem. We 
have ^ 


on TN S ш £ È Ey Zu = E B Pu (Xe < Y;} 
i lm 
T. 
Also 
п m л m 
s En U? = p È > à Eud Zi Zm) 
ue 5, 5, А 3 5 у T. 
p È Ey, Zy + LEE Eg Zij Zi) 
RET 
ол) .EEEE(QA + У У Ен) 
BE J ACT et RP e a К, hk ^ 
Now 


Q5 By Z3, = Pa (< Y) = 4 
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and for j # k sy 
Е (22а) =P {Zy = l Zi = 1} : 
= Py {Xi < Yp Xi < Yi. 
(26) =} 
since a designated one of three terms is smaller than the other two, Similarly 
(27) En (ZZ) = PuÁX; < Yp № < Y)-4 i#h, 
and fork € j, h xi E: 


ES (Z Za) = Pa Ui < Yn y Ya] 
= PaA Xis ҮРЕ < Yi) 
(28) e. ) 


It follows that 


mn mn(n — 1): ү тїт — 1) iommi = Ї)(п — 1) 
ДЕ ud naa а qum vgl 4 


Q9) “Ей = 
and 
(30) vary, (U) = тл BD 


y 
The statistic U is the sum of mn identically distributed (but not independent) 
rV's. A generalized version of the CLT shows (see, for example, Mann and 
Whitney [78]) that under Hy the statistic j 


Zias О тиа 

2те mnm + n 1)12 

tends to the standard normal distribution as m,n > co, provided that m/n 
remains constant. The approximation is fairly good for m,n = 8. 

Example 6. Two samples are as follows: ' 


Values of X,;: 1, 2,3,» 7, 9 11, 18 ү 
Values of ¥;: 4, 6, 8» 10, 12, 13, 14, 15, 19 


Them Rae sees 0 uc df alil dd 
Also, { : 


36, vary (0) = 83.0 49 +1) = 108, 


and 
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Since 1.732 > 1.65, we reject Ну that the two samples came from the same 
population at level a = .10. 


PROBLEMS 13.4 


n5 For the data of Example 6 apply the median test. 


2. Twelve 4-year-old boys and twelve 4-year-old girls were observed during two 
15-minute play sessions, and each child's play during these two periods was scored 
as follows for incidence and degree of agression: 


Boys: 86, 69, 72, 65, 113, 65, 118, 45, 141, 104, 41, 50 
Girls: 55, 40, 22, 58, 16, 7, 9, 16, 26, 36, 20, 15 


Test the hypothesis that there were sex differences in the amount of aggression 
shown, using (a) the median test and (b) the Mann-Whitney-Wilcoxon test. 
(Siegel [115]) 


3. To compare the variability of two brands of tires, the following mileages (1000 
miles) were obtained for eight tires of each kind: 


. Brand A: 32.1, 20.6, 17.8, 28.4, 19.6, 21.4, 19.9, 30.1 
; Brand В: 19.8, 27.6, 30.8, 27.6, 34.1, 18.7, 16.9, 17.9 


Test. the null hypothesis that the two samples come from the same population, 
using the Mann-Whitney-Wilcoxon test. 


4. Usethe data of Problem 2 to apply the Kolmogorov-Smirnov test. 
5. Apply the Kolmogorov-Smirnov test to the data of Problem 3. 


13.5 TESTS OF INDEPENDENCE» 


Let X and Y be two rv's with joint df F(x, у), and let Е, and Fp, respectively, 
be the marginal df's of X and Y. In this section we study some tests of the 
hypothesis of independence, namely, 


Ho: F(x, y) = F(x) FXy) for all (x, y) e 2; 
against the alternative 


Hy: F(x, y) # F,(x) Еу) for some (x, y). 


If the joint distribution function F is bivariate normal, we know that X and . 


Y are independent if and only if the correlation coefficient p — 0. In this 
case, the test of independence is to test Hy: p = 0. (See Remark 12.3.5.) 


4 
| 
: 
\ 
| 
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In the nonparametric situation the most commonly used test of independ- 
ence is the chi-square test, which we now study. 


Chi-square Test of Independence—Contingency Tables 


Let X and Y be two rv's, and suppose that we have п observations on 
(X, Y). Let us divide the space of values assumed by X (the real line) into r 
mutually exclusive intervals A}, 45, ---, A,. Sipilarly, the space of values of 
Y is divided into c disjoint intervals B, B, .-., B.. Asa rule of thumb, we 
choose the length of each interval in such a ey that the probability that 
X(Y) lies in an interval is approximately (1/7) (1/c). Moreover, it is desirable 
to have n/r and n/c at least equal to 5. Let X;; denote the number of pairs 
(X, Y), k = 1, 2, «++, n, that lie in A; x Bj, and let 


(1) pij = P((X, Y)e Ax Bj) = P(Xe A, and Ye В,), 

where i = 1, 2, ---, r, j = 1, 2, --+, c. If each p;; is known, the quantity 
л sf (Xi, — muy | 

о T a 


has approximately a chi-square distribution with rc — 1 d.f., provided that 
nis large. (See Theorem 10.3.2.) If X and Y are independent, P{(X, Y) 
€ 4; x Bj] = P{Xe Aj) P{ Ye B;}. Let us write pj. — P{XeA,} and Р 
= P(Ye Bj). Then under Ho: р; = р. р. i = 1, 2, ++, r, j = 1, 2, - 

In practice, р;; will not be: known. We replace p;; by ‘heir estimates. Under 
Ho, we estimate p;. by ) 


Xx, , 
o f. = TUE »04691,2 r, 
and p., by, 
д n Х; ; 
(4) Р.; = E E j-25b52 +, с. 


Since 35, j.;j = 1 = Xj B. we have estimated only г—1+с—1= 
r + c — 2 parameters. It follows (see Theorem 10.3.4) that the rv 
Pn epo =n. 21 

() Hen д E [ пру. P. 
is asymptotically distributed as Х? with rc — 1 — (r + c — 2) = (r — 1). 
(c — 1) d.f., under Ho. The null hypothesis is rejected if the computed value 
of U exceeds x Ixe а" 

It is frequently convenient to list the observed and expected frequencies of 
the rc events A, x B;inan г x ctable, called a contingency table, as follows: 
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Observed Frequencies, O;; Expected Frequencies, Ej; 
В, B, + Be В, B В, 
A Xo Xa Xe EX, Ay APPa, DPePa съз Pepe Пр 


A, Xa Xn Xe УХ Ao npepa MD Diz се прьр. MP» 


. uU "rr. owe. . LI . - К 
А Хх Xa Me УХ, A, 4 npepa pepe се pepe пр 
EX, EX; EX. n пр. MP. пр. п 


“Note that the Ху jj; 5 in the table are frequencies. Once the category A, x В, 
is determined for an. observation (X, Y), numerical values of X and Y are 
irrelevant. Next, we need to compute the expected frequency table. This is 
done quite simply by multiplying the row and column totals for each pair 
(i, j) and dividing the product by n. Then we compute the quantity 


EE (E; = Oy 

iw ij 
and compare it with the tabulated y? value. In this form the test can be 
applied even to qualitative data. Aj, 45, ---, A, and Bj, By, ---, B, represent 
the two attributes, and the ‘null hypothesis to be tested is that the attributes 
A and B are igs a 


Example 1; The following are the results for a random ‘sample of 400 
employed ^. Va 


TOREM. СС е Annual Income (dollars) 
(years) wit е 
Less than More than 

ч орау 8000 8000—15,000 1500s. 1 
«5 50 75 25 150 
5—10 25 50 254 100 
10 or more 25 75 50 150 

100 200 100 400 


If X denotes the length of service with the same company, and Y, the 
annual income we wish to test the hypothesis that X and Y are independertt. 
The expected frequencies are as follows: ү 


Time (years) Expected Frequencies 

with the 1 

same x <8,000 8 — 15,000. 215,000 Total 
company : 

«5 31.5 75 37.5. 150 
5—10. 25 50 25 100 
>10 37.5 75 i 37:5 150 


k у 100 200 t 100 400 


| 
| 
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Thus 
_ (125% , 0 , (1259 (12.592 (12.5? 
UO ++ Geert Oh OO 3757 + 005 
= 16.66. 


The number of degrees of freedom is (3 — 1)(3 — 1)= 4, and Kaos 
=9.488. Since 16.66 > 9.488, we reject Hp at level .05 and conclude that 
length of service with.a company is not independent of annual income. 


Kendall’s tau...» 
Let (X, Yi), (X, Y2), ---, (X,, У,) be à sample from a bivariate population. 


Definition 1. For any two pairs (X;, Y;) and (X; Yj)we say SS the relation 
is perfect concordance (or agreement) if 


(6 X, < X; whenever ¥;< ¥; ^ or. | X, X, whenever Y, > Y; ` 
and that the relation is perfect discordance (disagreement) if 
(7) a > X; whenever Y; < Y; ог Х, < X; whenever Y; > Y; 


онай т; ана ту for the probability of perfect ке and of per- 


fect discordance, respectively, we have 

(8) vB apps uh Y)» 0} 
and 

(9) Ta = P{(X; - X) (Y; - Y) < 0}, 


and, if the marginal distributions of X and Y are continuous, 
= [P{Y; < Yj) — Р{Х > X; and Y; < Ү}] 
+ [P{¥;> Y) – Р(Х, < X; and Y; > Y3] 
(10) =l – ла. 
Definition 2. Тһе measure of association between the tv's x and Y defined 
by 
(11) C= Ke ni 


is known as Kendall’s tau. 


If the marginal distributions of X and Y are continuous, we may rewrite 
(11), in view of (10), as follows: 


(12) т = 1 — 2л, = 2л,— 1. 
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In particular, if X and Y are independent and continuous rv's, then 
P{X; < X) = P{X, > X) = 4, 
since then X; — X, is a symmetric rv. Then 
Te = P(X, < Xj} P(Y, < Yj} + P(X, > Xj) Р{Ү,> Y) 
= P(X; > X) P(Y, « Y) + P(X; « X) P(Y;» Ү,) 
= Ta, 
and it follows that т = 0 for independent continuous rv's. 

Note that, in general, т = 0 does not imply independence. However, for 
the bivariate normal distribution т = 0 if and only if the correlation coeffi- 
cient p, between X and Y, is 0; so that т = 0 if and only if X and Y are in- 
dependent (Problem 6). 

In order to use т as a test of independence, we first need to find an esti- 
mate of т from the sample. Let us write 


(13) Аў = sgn (X, — X;) sgn (Y; — Y; 
where sgn и = = lifu<0,=0ifu=0, and = lif и > 0. Then 
A,, takes values a;; where 
1 if pairs (X, Y) and (X, У,) are concordant, 
- if pairs (Ж, Y;) and (X, .Y;) are neither concordant 


(14) а= nor discordant, 
| ш! if pairs (x. Y;) and (X;, Y;) are discordant, 
Also : 
a. Te if a, = 1, 
(15) Р{А = aj} = fe if a;=—-1, 
1-я, = ту if) а; = 0, 
апа і 
(16) j A E 
Thus 4,; is an unbiased estimate for т. Note that а; = aj; and aj; = 0, so that 
1 = у ў ("а 
(17) T & Ё () Aj 
e: i<j 


is an unbiased estimate of c. 


Definition 3. The statistic T defined in (17) is known as Kendall's sample 
tau coefficient. 


The statistic T may be written in several equivalent forms, but we will put 
it into a form that is convenient for computation. Let us write 


^. 


trm ol 
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P = number of positive A,,’s 


N = number of negative 4,/s ° isis sz 


Then 


e à 


If there are no ties within the X or the Y observations, 4;; # 0 for i = j and 
P+N= 6). In that case 


(19) 


Note that - 1x T « I. 

To test Ho: X and Y are independent against H,: X and Y are dependent, 
we reject Ho if |T| is large. Under Ho, т = 0, so that the null distribution of 
T is symmetric about 0. Thus we reject Hp at level а if the observed value 
of Т, t, satisfies |t| > t,,2, where P(|T| = г. |Н = a. 

For small values of n the null distribution can be directly evaluated. Values 
for 4 < n < 10 are tabulated by Kendall [57]. Table 12 on page 667 gives 
the values of S, for which P( S > S,} < a, where S = (7) T for selected 
values of n and a. 

For a direct evaluation of the null distribution we note that the numerical 
value of T' is clearly invariant under all order-preserving transformations. It 
is therefore convenient to order X and Y values and assign them ranks. If 
we write the pairs from the smallest to the largest accordiríg to, say, X values, 
P is the number of pairs of values of 1 < i < j < n for which Y; = Y; > 0. 


Example 2. Let л = 4, and let us find the null distribution of T. There are 
4! different permutations of ranks of Y: 

Ranks of X values: 1 2 3 4 

Ranks of Y values: aj az az а 
where (ау, aa, аз, a4 ) is one of the 24 ‘permutations of 1, 2, 3, 4. Since the 
distribution is symmetric about 0, we need only compute one half of the 
distribution. 
Р T Number of Permutations Py (T = 1) 
0 .—1.00 1 
1 = .67 3 
2 = .33 5 
3 .00 6 


Se Bin Sa Be 
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Similarly, for n = 3 the distribution of T under Hy is as follows: 


P T Number of Permutations Pg (T = t). 
0 —1.00 1:43, 2, 1) і : 
1 — .33 2:Q, 3, 1), (3, 1, 2) # 
Example 3. Two judges rank four essays as follows: | 
Essay i 
Judge 1 2 3 4 | 
Lx 3 4 2 1 4 
2,Y 3 1 4 2 


То test Hy: rankings of the two judges are independent, let us arrange the 
rankings of the first judge from 1 to 4. Then we have: 
Judge |, Y: `1 2 3 4 
Judge 2, Y: 2 4 3 1 
Р = number of pairs of rankings for Judge 2 such that for j > i, Y; — Y; 
> 0 = 2 [the pairs (2, 4) and (2, 3)], and 


2.22 xA 
b= @ =l = .33, 
2) 
Since f ы 
иа * an 
Py, |T| 2 33) gym 5, 
we accept Ho. я 


For large n(> 8) it can be shown (see Kendall [57]) that the statistic, Т 
has an asymptotic normal distribution with mean 0 and variance 2(2n + 5)/ 
[9n(n - «)). Thus, for large n, the statistic 7 ; 


(20) 


Spearman’s Rank Correlation Coefficient 


Let (Xr Yy), (Xo Р), (Xp Y„) be a sample from a bivariate-popula- 
tion. În Section 7.3 we defined the sample correlation coefficient by 


LG- DT- Р) p. 
(Ea im- yl" 


(21) es 
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where 
У and XS wi ЗҮ, 
H i=l 


If the sample values Ху, ve ++, X, and Y, Y, =, Y, are each ranked 
from 1 to л in increasing order of magnitude separately, and if the X's and 
Y's have continuous df's, we get a unique set of rankings. The data will then 
reduce to n pairs of rankings. Let us write 

R; = tank (Х),  S; = rank (X); 


then R; and 5; € (1, 2, ~--, n). Also, 


Q2) ERA Es = MEY, 
Q3) й=тїўң,= 254, #=т1ў5= 151, 
and 
(24) Haun Bs fru D. 
п 1 


Substituting іп (21), we obtain 


12 È (R; - B); — 5) 


т п? —n 
IDE RS. заа 1) 
(25) ттр EE - 


жашыр = R;— S; = (В; — R) — (S; = $), we have 
= ŠR- RÝ +È б— 57 -2}) (Ri =R) (Si 5) 
= 4n- р-з BUS, 9 


iM: 
iM: 


and it follows that 


6È D 
R=1- 
(26) APT =) 
The statistic R defined in (25) or (26) is called Spearman's rank correlation 


coefficient. 
. From (25) we see that 


ER= ae oy MES) > At» 
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# 
12 3(n + 1) 
E RRS) = 2) 
2n m-1 (R8) n-1 
Under Ho, the rv's X and Y are independent, so that the ranks R; and S; 
are also independent. It follows that 


_үл+1ү 
En (R;S) = ЕВ, ES, = (151), 
апа 
. 12. (n1Y - Xn4 1) 
2 Ер К = = = 0. 
eo re E) mcd 
Thus we should reject Ho if the absolute value of R is large, that is, reject 
Hy if 
Q9) |R| > Res 
where Pj (|R| > Ra} < a: To compute R, we need the null distribution of 
К. For this purpose it is convenient to assume, without loss of generality, 
that R; = i, i = 1, 2, „n. Then D; = i — $ i = 1, 2, --+, n. Under Hy, X 
and Y being independent, the n! pairs (i, S;) of ranks are equally likely. It 
follows that 
(30) Py {R = r} = (n!) x (number of pairs for which R = n 
" n, " 
=a 58У. 


n 


Note that —1 < R « 1, and the extreme values can occur only when either 
the rankings match, that is, R; =S; in which case R = 1, or R; =n + 1 — Sj 
in which case R = —1. Moreover, one need compute only one half of the 
distribution, since it is symmetric about 0 (Problem 8). 

In the following example we will compute the distribution of R for n = 3 
and 4. The exact complete distribution of Dj: Dj, and hence R, for n < 10 
has been tabulated by Kendall [57]. Table 13 on page 667 gives the values 
of R, for some selected values of n and a. 


Example 4. Let us first enumerate the null distribution of R for n = 3. This 
is done in the following table: 


12 Y is, 

08 $ 1. 34 1) 

i з э) BU Twin тузүү" 
(1, 2, 3) 14 1.0 
(1, 3, 2) 13 5 


(2, 1, 3) 13 S 
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Thus 
b r= 1.0; 
Me ате Я r= .5 
Py {R = rj = 2, К=з; 
bo r--10 
Similarly, for п = 4 we have the following: 
(Sty S25 S3, 54) Mus Ran S Pa {К = т} 
1 
(1, 2, 3, 4) 30 Bou 4 
ig 3; 2, 4), (2, 1, 3, 4) 29-8 ——'3 å 
(1, 2, 4, 3) 
(2, 1, 4, 3) o ME se Ае 5 
s 3, 4, 2), (1, 4, 2, 3), (2, 3, 1,4) 
(3,1, 2, 4) ОТАУ inui 4 
(1, 4, 3, 2), (3, 2, 1, 4) 26 боду C2. À 
d 25 и УА, å 
‘The last value is obtained from symmetry. 
Example 5. In Example 3, we see that 
pe 2028 13х54 
gait АЕ da A.V 
Since P4, (|R| > .4} — 18/24 = .75, we cannot reject H at а = .05 or 


= 10. 


For large samples it is possible to use a normal approximation. It can be 
shown (see, for example, Fraser [33], 247-248) that under Ho the rv 


zi (12 S R;S; — 3r 
i=l 
or, equivalently, Р 
Z= Ryn 1 


has approximately a standard normal distribution. The “approximation is 
good for n > 10. 


PROBLEMS 135. 


1. A sample of 240 men was classified according to characteristics A and В. 
Characteristic A was subdivided into four classes 4;, Az, Aj, and A, while B was 
subdivided into three classes Bj, B,, and B;, with the following result: 
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irs: 4; Аз A | 
В, 12 25 32 11 80 
В, 17 18 22 23 80 
В, 21 17 16 26 80 
| 50 60 70 60 | 240 


Is there evidence to support the theory that A and B are independent? 


2. The following data represent the blood types and ethnic groups of a sample 
of Iraqi citizens: 


Blood Type 
Ethnic Group о А B AB 
Kurd 531 450 293 226 
Arab 174 150 133 36 
Jew 42 26 26 8 
Turkoman 47 49 22 10 
Ossetian 50 6.59 26 15 


Is there evidence to conclude that blood type is independent of ethnic group? 


3. Ina public opinion poll, a random sample of 500 American adults across the 
country was asked the following question: '*Do- you ‘bélieve that there was a 
concerted effort to cover up the Watergate scandal? Answer yes, no, or no 
opinion.’’ The responses according to political beliefs were as follows: 


Political Response 

Affiliation Yes No No Opinion 

Republican 45 75 30 150 

Independent 85 45 20 150 

Democrat 140 30 30 200 
270 150 80 500 


Test the hypothesis that attitude toward the Watergate cover-up is independent of 
political party affiliation. 


4. А random sample of 100 families in Bowling Green, Ohio, showed the follow- 
ing distribution of home ownership by family income: 


Annual Income (dollars) 


Residential Less than 7,500- 12,000 
Status” 7,500 12,000 or Above 
Home owner 10 15 30 
-Renter 8 x 17 20 


Is home ownership in Bowling Green independent of family income? 


5. Ina flower show the judges agreed that five exhibits were outstanding, and 
these. were numbered arbitrarily.from 1 to 5. Three judges each arranged these 
five exhibits in order of merit, giving the following rankings: 
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Judge А: 5 3 1 2 4 
Ju dge В: 3 1 5 4 2 
J'adge С: pa 2 3 1 4 


(Compute thie avarage values of Spearman's rank correlation coefficient R and 
JKendall's s ample tau coefficient Т from. the three possible pairs of rankings. 

6. For the bivariate normally distributed rv (X, Y) show that т = 0 if and only 
if X and Y are independent. 

(Hint: Show that т = (2/z) sin- p, where p is the correlation coefficient between 
X and Y.) 

7. Show that т is estimable with degree m = 2. The estimate А; is unbiased for t 
with variance < 1,, and the U-statistic is given by. (17). 

8. Show that the distribution of Spearman's rank. correlation coefficient R is 
symmetric about 0 under H,. 

9. In Problem 5 test the null hypothesis that rankings of judge A and judge C are 
independent. Use both Kendall's tau and Spearman's rank correlation tests. 


10. А random sample of 12 couples showed the following distribution of heights: 


Height (inches) Height (inches) 
Couple Husband Wife Couple Husband Wife 
1 80 72. 7 74 68 
2 70 60 8 71 71 
3 73 6 76 9 63 61 
4 172 62 10 64 65 
5 62 63 11 68 66 
6 65 46 12 67 67 


(a) Compute 7. 

(b) Compute К. 

(c) Test the hypothesis that the heights of husband and wife are асны, 
using T as well as R. In. cach case use the normal approximation. 


13.6 SOME USES OF ORDER STATISTICS 


In this section. we consider some applications of order statistics. Here we 
are mainly interested in three applications, namely, tolerance intervals for 
distributions, coverages, and confidence interval estimates for quantiles. 


Definition 1. Let F be a continuous df. A tolerance interval for F with 
tolerance coefficient 7 is a random interval such that the probability is 7 
that this random interval covers at least a specified percentage (100p) of the 
distribution. 

j * 
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Let Xy, X», ---, X, be a sample of size п from F, and let Хи, Xi, ---, Xim 
be the corresponding set of order statistics. If the end points of the tolerance 
interval are two order statistics Хү, Xi, г < s, we have 


(1) P(P(Xy Хе Xo) 2 p) = 
Since F is continuous, F(X) is U(0, 1), and we have 


Р(Х € X « Xu) = P(X < X5) - P(X < Xp} 
F(Xi) — F(X) 


О) Uc — Un, 
where „у, Uc are the order statistics from U(0, 1). Thus (1) reduces to 
(3) P(U, — Uo; p) = T. 


Using the joint distribution of U;,, and U,, (see Theorem 4.5. 4), we have 
from (3) 


n! x yc yy 
osi gites po gag 
“(l= yy" dx dy 
for all 0 « p « 1 and r < s. Unfortunately, (4) is not easy to solve. It is 


convenient to write U = О, — О, and let V = U. Then the joint pdf 
of U and V is given by 


n! 7-1 srl ps 
+ Jen Desr- aap 07 1-0), 
Луб Y) = O<u<v<l, 
0, otherwise. 
Thus 


А = c— prs = yrs ы D fÉ (= uy! d уў dv 


! — 1 r=. 7-5 
Е ойы Si ‘= 4 11 — wh 
4 \ -(1 — uy dt 


r EM n-str 
ULM d= Ww" Bis = e 1) 


m MOSES I EXER) Di = t5 o owed 


Using IQ in ü ys we may write ү, а$ 


© E А ed f "acti ERIS pet! (14 = шу” du. 
Now using (5.3.56), we get * 


oe Аел Ааа 
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(0 r=" Er py 


f= 


which is much easier to use. Given л, P, T, it may not always be possible to 
find s — r to yield the exact tolerance coefficient 7. 


Example 1. Lets = mand r = 1. Then 
т (ра ру ре пр 
Ір = .8,п = 5,r = l,then 
T —127(8) — 6.8) (2) = .181. 
Thus the interval (Ху, Ху) in this case defines an 18 percent tolerance 
interval for .80 probability under the distribution (of X). 


Example 2. Let Xj, X», Xs, Ху, X; be a sample from a continuous df F. 
Let us find rand s, г < s, such that (Xt), Хо) is a 90 percent tolerance 
interval for .50 probability under F. We have - 


6 Js 5 1y 
pase n= EO 
It follows that, if we choose s — r = 4, then 7 = .81; and if we choose : 


$— r = 5, then 7 = .969. In this case, we must settle for an interval with 
tolerance coefficient .969, exceeding the desired value .90. 


In general, given p, 0°< p < 1, it is possible to choose a sufficiently large 
sample size n and a corresponding value of s — r such that with probability 
> 7 an interval of.the form (Xi, Xt) covers at least 100p percent of, the 
distribution. If s — r is specified asa function of л, one chooses the smallest 


sample size n. 


Example 3. Let p — $ and 7. ..75.: Suppose that we want to choose the 
smallest sample size required such that (Хо), X(,)) covers at least 75 percent 
of the distribution. Thus we want the smallest t to satisfy — } 


С. 


— Table 1, page 648, of binomial — iti orby ола ела formal 
approximation, we see that п С 14. 


The statistic U = Ug — Us, for L <r < з x n, defined bows i is called 
the coverage of the-random interval (Xe), хо). Моге geom ithe 
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Uy = F(X) = Оа» 

@) Uz = F(X) — Е(Ха) = Оо — Оа, 
Uy = FX.) Е(Х„-) = Шш = Uii 
Un = 1 — F(X) = 1— Шш, 


are called coverages; Us, — Ut) = F(Xis)) — F(X») is called the coverage 
of the interval (X), X(5).- Thus the coverage of a random interval (based 
on order statistics) is the amount of probability in the continuous df F 
contained in the random interval. In (5) we determined the pdf of Ut — Оо 
= Unit Ur tenat О, 1< r < s < п. Taking r = i — 1 and s = i, we 
obtain the marginal pdf of U; as 


nomui, O<u<l, 
(9) Ји) = a otherwise. 
Thus EU; = 1/(n + i This may be interpreted as follows: The order 
statistics Хо), Хо), >, Xim partition the area under the pdf in n + 1 parts 
such that each part. has the same expected area. 
It is not very difficult to compute the joint pdf of 109 Uz» +, О,. In fact, 
the inverse transformations are as follows: 


Ua = Ur ait 
Ue mU tU s 
tw ) Li = Uy + Un + О 


so 9 that the Jacobia of the transformation is 1: Also;u; > 0 and Xu; < 1. 
Since the joint pat of Uih, Шу, +++; Um is known to be 


i: nl if0 <x < x< <x, <1, 
fon, x t ES 5 e otherwise, 


. it follows that the joint pdf of Uy, U2, *+-, U, is given by 


"E a "em jn Кш ж0, isdem ш < 1, 
у 0. c otherwise. 


Moreover, h is symmetric in uj, uz, :-*, uU, so that the distribution of every 

‘sum of 7, r. «n, доѓ these соуокаве is the-same, and in particular it is the 

distribution of Ui, = U, + Uz + --- + О Thus the pdf of, 57.) U= Uw) 

‘is given by ; 

Yn уай, PIRE itp TY a-r с 

ay Total Nt Шата stabi gO aime be j 
er 5500 : otherwise. А 
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We have 
(12) EU, = E EU, di 


r 
ntl: 
Тһе sum of апу r successive coverages д, U;+2 --, U,,, is sometimes - 
referred to as an r-coverage. We have 
(13) Шз. + Use + + Он, = Uujn — Uw, itrsm 

We next consider the use of order ‘statistics in constructing confidence 
intervals for population quantiles. Let X be an rv with a continuous df F, 
0 « p « 1. Then the quantile of order p satisfies 
(4) Ек) = P. 
Let Xj, Xo, +, X, be n independent observations on X. Then the number 
of X/s < x, is an rv that has a binomial distribution with parameters п and 
p- Similarly, the number of X;’s that are at least w, has a binomial distribu- 


tion with parameters п and 1 — p. 
Let Ха, Хоз» =s Xim be the set of order statistics for the sample. Then 


PUE kp} = P(At least r of the X/s < ку} 
a5) = BE (пра у 
Similarly ced AER 
P(X. = ку} = P(At least n = s + 1 of the X/s > ху} 
= P(At'most s + 1 of tlie X/s < ку} 


(16) a eG) Pn 
It follows from (15) and (16) that, : i 


P(Xo, € ку & Xo) = P(Xo > ку} — P{Xq > ку} 
‚ = Р{Х„, <) 1 + Р{Х„ = Kp} 


иеш LA arse 
ал = z()ra.- »r. i 


It is easy to determine a confidence interval for к, from (17), once the 
confidence level is given. In practice, one determines г and s such that s — r 
is as small as possible, subject to the condition that the level is 1 — a. 


Example 4. Suppose that we want a confidence interval for the. median 
(p = 4), based on a sample of size 7 with confidence level .90, It suffices to 
find r and s, r < s, such that., i a 
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sMTMAV 
Ha I a .90. 
2) = 
By trial and error, using the probability distribution b(7, }) we see that we 


can choose s = 75,7 = 2 or r = 1, 5 = 6; in either case s — r is minimum 
(= 5), and the confidence level is at least .92. 


Example 5. Let us compute the number of observations required for 
(Ха Xim) to be a .95 level confidence interval for the median, that is, we 
want to find л such that 

P(Xa) € куз S Xo) 2 95 


It suffices to find п such that 


It follows that n = 6. 


PROBLEMS 13.6 М 


1. Find the P i values of л such that the intervals (a) (Ха. Х.м) (б) 
(Хо, X-1)) contain the median with probability > .90. 


2. Find the smallest sample size required such that (Xc; Xc») covers at least 90 
percent of the distribution with probability > .98. 

3. Find the relation between л and р such»that (Xi, Xem) covers at least р 
percent of the distribution with probability > 1.— p. 

4. Given y, б, po, p, with p; > Po, find the smallest л such that 


PIRX o) - Р(Х) 2 Po) 2 7 
and 4 ў 
PIRX o) — ЕХ) > ру} < д. 
Findalsós – г. (| 
(Hint : Use the sone approximation to the зоа distribution.) 
5. In Problem 4 find the smallest'n and the associated vaise of s — r if y = .95, 
6 = .10, pj = .75, po = .50. 


13.7 ROBUSTNESS н д 


Most of the statistical inference problems treatéd in this book ‘are parainetric 
in nature. We have assumed that the functional form of the distribution 
being sampled is known except for a finite number of parameters. It is to be 


aana 
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expected that any estimate ог test of hypothesis concerning the unknown 
parameters constructed on this assumption will perform better than the cor- 
responding nonparametric procedure, provided that the underlying assump- 
tions are satisfied. It is therefore of interest to know how well the parametric 
optimal tests or estimates constructed for one population perform when the 
basic assumptions are modified. If we can construct tests or estimates that 
perform well for a.variety of distributions, for example, there would be little 
point in using, the corresponding nonparametric method unless the assump- 
tions are seriously violated, 

In practice, one makes many assumptions in parametric inference, and any 
one or all of these may be violated. Thus one seldom has accurate knowledge 
about the true underlying distribution. Similarly, the assumption of mutual 
independence or even identical distribution may not hold. Any test or 
estimate that performs well under modifications of underlying assumptions 
is usually referred to as robust. This subject is receiving considerable attention 
of late. We refer to Huber [50] for a review of the literature and a historical 
survey. See also Govindarajulu and Leslie [39]. We content ourselves here by 
considering some commonly used estimates and test procedures. 

The most commonly used estimate for the population mean g is the sample 
mean, It has the property of unbiasedness for all populations with finite 
mean. For many parent populations (normal, Poisson, Bernoulli, gamma, 
etc.) it is a complete sufficient statistic and hence a UMVUE. Moreover, it is 
consistent and has asymptotic normal distribution whenever the conditions 
of the central limit theorem are. satisfied. Nevertheless, the sample mean is 
affected by extreme observations, and a single observation that is either too 
large or too small may make X worthless as an estimate of u. Suppose, for 
example, that Xj, Хә‘, X, is a sample from some normal population. 
Occasionally something happens to the system, and a wild observation is 
obtained; that is, suppose one is sampling from (4, ^), say, 100a percent 
of the time and front Wis ko’), where k > 1, (1 — a)100 percent of the time. 
Here both и and.o* are unknown, and one wishes to estimate д. In this case 
one is really sampling from the density function 


(1) Дх) = алх) + (1 — a)fi(x), 

where fy is the pdf of .4/(ui, 0°), and fi, the pdf of (4, ko”). Clearly. 
iach 

(2) gets Z=- 


is again unbiased for д. If a is HEU І, there is no problem since the under- 
lying distribution i is nearly W(u, g ?), and X is nearly the UMVUE of j with 
variance т Zin. If 1 — а is large (that is, not nearly 0), then, since one is 
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sampling from f, the variance of X, is c^ with probability a and is ko? with 
probability 1 — a, and we have 


1 а? 
(3) vat, (X) = ,- var (0) = = la + (1 – о). 


If k(1 — o) is large, var, (X) is large and we see that even an occasional wild 
observation makes X subject to a sizable error. The presence of an occasional 
observation from W(x, ko?) is frequently referred to as contamination. The 
problem is that we do not know, in practice, the distribution of the wild 
observations and hence we do not know the pdf f. It is known that the 
sample median is a much better estimate than the mean in the presence of 
extreme values. In the contamination model discussed above, if we use 21,2, 
the sample median of the X/'s, as an estimate of u (which is the population 
median), then for large n 


4 21) gr И У 
(4) E(Zi;2 — и) var (21,2) d UG 
(See Theorem 7.3.7 and Remark 7.3.1.) Since 

flu) = ал) + Aa fil) 


MN ATE ^s 1 A l-a 1 
0A 2n zu а) у Ink («+ JE а 


we have 


2 1 
б) var (Zi) «€ P isset у: 
2n {a + (0 = a) /У Е) 
Аз К 2565, vat (Z;/2) = xo^ [2no?. If there is no contamination, а = land 
var (21,2) ~ zo" |2n. Also, 


zc^|2m? _ 1 
no" |2n a : 
which will be close to 1 if œ is close 10115 Thus the estimate 21/5 will not be 


greatly affected by how large К is, that is, how wild-the observations are. We 
have 


var (X) 
var (21,2) 


=2 01-а 
ela + = аиа e Or o o ask oo. 
Indeed, var (€) > oo as k — œ, whereas var (21,2) ^ zo?|2na? as k > ©. 
One can check that, when k = 9 and az .915, the two variances are (ap- 
proximately) equal. As k becomes larger than 9 or a smaller than .915, Z1/2 
becomes a better estimate of д than X. е 
There are other flaws as well. Suppose, for example, that Xi, Xz, `°, X, is 
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a sample from U(0, 0), 0 > 0. Then both X and T(X) = (Хау + Xm)/2 
where X; = min (Xy 5 Xn) Xo = max (X, --, X,) are unbiased for 


EX = 0/2. Also, var, (X) = var (X)|n = 6" | [12n], and one can show that 
var (T) = 62/[2( + 1) (n + 2)] It follows that the efficiency of X relative to 


that of T is 


ей, (X|T) = ware = wot («2-1 ifn22 


In fact, eff; (X | Т) o» as n ә oo, 50 that in sampling from a uniform parent 
X is much worse than T, even for moderately large values of n. 

Let us next turn our attention to the estimation of standard deviation. Let 
Xy Xs у» X, bea sample from (4, a?). Then the MLE of g is 

п 
10 (азий уне И! 

© DEAD ces da e Corp кү, 
Note that the lower bound for the variance of any unbiased estimate for с 
is g? | 2n. Although б is not unbiased, the estimate 


ж Tai - DIA fmm t T 9/2] 
ss ara ha 5 


D 2 Г(п[2). 


is unbiased for а. Also, 
ouf nc Г 0/2] qi: 
var (RUP i ZR TGD ) =a} 
1 
(8) = A + L) 
the estimate with least variance = 0° [2n is 


Thus the efficiency of Sy(relative to 


уаг(9) у + 1, 2Y,1 as n-» oo. 
а? | 2n 74) ое 
For small п, the efficiency of S; is considerably larger than 1. Thus, forn = 2, 
eff(S;) = Xr — 2) & 2.223, and, for п = 3, eff (S) — (6/2)(4 — т) © 1.636. 


Yet another estimate of c is the sample mean deviation 


(9) Мт E 


Note that 
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If n is large enough so that Х= ш, we see that 5; = 4/(z/2)8; is nearly 
unbiased for g with variance [(z.— 2)/ 2n]o?. The efficiency of S; is 
к= 2у/дл}о® a s 
a |2n 

For large п, the efficiency of S, relative to S; is 

аг (5). _ (0°/2n)+ (jn) | 1 , 2 = (+) 

‘Var (Ss) (т -2)2np? т-2 ^ z-2 

ж 1876. 

"Now LE that there is some contamination. As before; ds us suppose 
that. for a proportion а of the time we sample from (и, a^) and di a 
proportion 1 – а of the time we get a wild observation from ^N (4, ko? ), 
k > 1. Assuming that both р and c^ аге unknown, suppose that we wish to 
estimate.g. In the notation used above, let 


Јо) — afi) +(— afte) 


where fo is the pdf-of (иј a^), and fi, the pdf of (s; ko"). Let us see how 
even small contamination can make the maximum likelihood estimate ô of с 
quite useless. 

‘If 6 is the MLE of 0, and ф is a function of 0, then q(0) is the MLE of 
(0). By Taylor's expansion, a first approximation is 


pÂ) = 96 — 0 + 0) = 90) + (0 0-20), 
ш) ps a za og [aor 
Taking. x =o NS 90)= d 9. we get 


2n Srt 


(I tta (bh Eeo lie EG — o. 
) a ( a’) 
Using Theorem 7.3. n we see that 
2 
(13) E@ — а?ў x Be. 
(dropping the.other two terms pad and п? in the denominator), so that 


(14) Её — of = 


For the density fi we see that 
(15) à щ = 30'fa + K'(1 — a 
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and 
(16) fn = ofa + К(1— a) 
It follows'that 


(17) Be Lig}? = gih læ + Rey) — fa + kl — ap. 
4n 


If we are interested in the effect of very small contamination, а ~ 1 and 
1 — а = 0. Assuming that k(1 — a) a 0, we see that 


т i 
Е{2 — o) x EN -K-2a)-1) 
2 
(18) = a; [L+ 2&1 — aj. 
Tn thé normal case, 4 — 36* and иу = o*, so that 


2 
2 СД 
Ee с} x EL 
Thus we see that the mean square error due to a small contamination is now 
multiplied by a factor [1 + 20 — @)]. If, for example, К = 10, а = .99, 
then 1+ ik — a) = $. If k = 10,a — .98, then 1 + ika = a) = 4, and 
so on. 

A quick comparison with,S, shows that, although S,(or even ô) is a better 
estimate of g than S; if there is no contamination, the estimate S, becomes 
a much better estimate in the presence of contamination as k becomes large. 

One of the most commonly used tests in statistics is Student’s t-test for 
testing the mean of а normal population when the variance is unknown, Let 
Xi, Xp, «5, X, be a sample from some population with mean и sud finite 
variance E As usual, let X denote the sample mean, and 5°, the sample ` 
variance. If the population being sampled is normal, the t-test rejects. Hp: 
K= ш against Hy: u + po at level a if [x - kol > taa 72 (S/4/n). Ifnis 
large, we replace /„—_1,„,2 by the corresponding critical value, 2, з, under the 
standard normal law. If the sample does not come from a normal population, 
the statistic Т = [(¥ — 19)/S] y n is no longer distributed as a t(n — 1) 
statistic. If, however, n is sufficiently large, we know that T has an asymptotic 
normal distribution irrespective of the population being sampled, as long as 
it has a finite variance. Thus, for large л, the distribution of T is indepen- 
dent of the form of the population, and the t-test is robust. The same 
considerations apply to testing the difference between two means when the 
two variances are equal. Although we assumed that л is sufficiently large 
for Cramér's result (Theorem 6.2.15) to hold, empirical investigations have 
shown that the test based on Student's statistic is robust. Thus a significant 
value of may not be interpreted to mean a departure from normality of 
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the observations. Let us next consider the effect of departure from independ- 

ence on the t-distribution. Suppose that the observations X;, Xy «++, X, have 

a multivariate normal distribution with EX; = р, var " )- 0°, and p as the 
common correlation coefficient between any X; and X; i # j. Then 


(19) EX=,  уаг(Х) = £s [1 + (n – Dol 


and 
ae Ly Laž 
е5 = hH HEM EEX x} 


i=1 


p (MEX? — пп EX} + mln — 1) (00? 1D) 
(20) =a%(1 = p. 

For large n, the statistic J/m(X — 0)/5 will be asymptotically distributed 
as (0, 1 + пр/ (1 — p)), instead of W(0, 1). Under Ho, T^ = n(X — uy S" 


is distributed as F(1, n — 1) when the observations are independent and 
identically distributed as (и, 0?). Consider the ratio 


nE(X — wo), cd (n-Do] ру т. 
ES? o(1 — p) lu oi 

The ratio equals 1 if p = 0, but is > 0 for p > 0 and — со as р — 1. It fol- 
lows that a large value of T is likely to occur when р > 0 and is large, even 
though д is the true value of the mean. Thus a significant value of ¢ may 
be due to departure from independence, and the effect can be serious. 

Next, consider a test of the null hypothesis Но: g = со against Ну; с * то. 
Under the usual normality assumptions on the observations Xi, X», --*, Xm 
the test statistic used is 


01) 


ї Ld xd 2 
Q2) cue Ade ts 
i а? а? : 

which has a x (n — 1) distribution under Hy. The usual test is to reject Ho 
if 

= 1s? 
(23) Vy -— = 081 Sea барго руд пог 

o А 

Let us suppose that X, Xo, ---, X, are not normal. It follows from Corollary 
2 to Theorem 7.3.5 that 


(24) var (S*) = P + Bono в 


so that 

s 1 3—n 
25 табора ие „ j 
Q5) [RT RT 
Writing 72 = (щ/0“) — 3, we have 

pists bug. 2 
26 var(2.)= 12 + 
(26) (С) 2 
when the Xs are not normal, and " 

SEN ND 

2 (=) бы лын 


when the X/s are normal (7; = 0). Now (n — 1s? = eX = Ху. is the 
sum of n identically distributed but dependent rv's(X; — X y, 7 = 1,2, 050. 
Using a version of the central limit theorem for dependent rv's (see, for 
example, Cramér [18], 365), it follows that 


n—1\ 22/8? 
ae. 
under Ho, is asymptotically „Ж (0, 1+ (T212); and not,4-(O, 1),as under the 
normal theory. As а result the size.of the test based on the.statistic. Vo will 
be different from the stated level of significance if 72 differs greatly from 0. 
It is clear that the effect of violation of the normality assumptiom can be 
quite serious on inferences about variances, and the chi-square test is not 
robust: Y f 
For a similar examination of some other well-known test criteria, we refer 
to: Scheffé [111]; Chapter 10. adm Siqma | 
In the above-discussion: we have used somewhat crude'calculations to 
investigate the behavior of the most ‘commonly used estimates and test sta- 
tistics when one or more of the underlying assumptions are violated. Our 
purpose here was to indicate that some tests or estimates are robust whereas 
others are not. The moral is clear: One should check carefully to see that the 
underlying assumptions are)satisfied before using parametric procedures. 


і DS A ай» f 3 f b otf 
PROBLEMS 13.7 


d tox $m y O° soi ju itty ja t: WO 
1. Let (Xin X5, 7 X,) be jointly. normal with, EX; = 5 var(X;) = 07, and 
cov (X; Xj) = po? if inj = j and = 0 otherwise. ¿+ 
(a) Showthat^ apens d% А де 2% 


(ая qm vog var (X) -i zn (1 Be z)] 
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and 
' ES = ei L EN 


(b) Show that the r-statistic A/n(X — и)/$ is asymptotically normally distributed 
with mean 0 and variance 1 + 2p, Conclude that the significance of t is over- 
estimated for positive values of p and underestimated for p < 0 in large 
samples. 

(c) For finite n, consider the statistic 

= mX- 
Ер 
Compare the expected values of the numerator and the denominator of Т? 
and study the effect of p #0 to interpret significant ¢ values. (Scheffé [111], 338) 


È Le a Xy x; be a randoni sample from Ga. B) a > 0, B>0. 


(a) Show that ó 
i fa = ав, Ms =.За(а + 28 
(b) Show that 


var fo) St x oC ne ©), 


(сеў! Show that the: ‘large sample distribution of (n'— 1)S?/o? is ‘normal. t 
(d)* Compare the largé-sample test of H: o = 7; based on the asymptotic norm- 
O sality of (ит 1):52/02: with the-latge-sample'test' based on the same statistic 
99. when the observations'are taken from.a:hormal population. In particular, 
155 take g 2220603 o ёла MT. f 
3. Let X, Xo; ++ X, and Yi, Yn +; Y, be two independent random samples 
from populations with means? 4j and“, and variances тү? and oz, respectively. 
Let X, Y be the two sample means, and S,?, 5,2 beithe two sample’ variances, 
Write. N = т + п, R = mini and. @ =.¢;?/¢,?. The, usual. normal baboons test, of 
Ну: fa be = ant is.the t-test based on the statistic ; i 


ik (gm i Ж = ye бу i 
1h29 35 dim e ny thai 2x 


xia bi: $ 17 й 


(m = 1)S? + (п = 952 1: 
т+п- 2 

Under Hp, the statistic T has а r-distribution with N — 2 d.f., provided EL 

0 = og. 

Show that the asymptotic distribution of T in the du nuit case is 
ЖО, (6 + К) (17+ RO) for large m and'n. Thus; if R = 1, T is asymptotically 
M (0, 1) as in the normal theory case assuming equal variances, even though the 
two samples come from nonnormal populations with unequal variances. Conclude 
that the test is robust in the case of large, equal sample sizes. 


(Scheffé [111], 339) 


CHAPTER 14° 


Sequential Statistical Inference’. 


14.4 INTRODUCTION 


In all the problems of statistical inference that we have considered so far, the 
sample size was fixed in advance. We assumed that a sample of fixed size n 
was available which did not depend on the observations. It is easy to con- 
struct examples in which such a procedure is obviously wasteful since it does 
not take advantage of the, information supplied, by the observations. Thus, in 
а coin-tossing experiment where we wish to test Ho: p = 4, if n — 1 tosses 
produce heads the result of the nth trial is not likely to change our decision 
to reject Ho, provided that n is not too. small. This example suggests that it 
may:be disadvantageous in some problems to fix the sample size in advance. 
An: alternative procedure, suggests itself: Sample sequentially, that is 
take one observation after another and make a decision to stop after the nth 
observation by using some procedure that is based on the observations 
Хы Aa йл. | dpa ? pati 

In this chapter we consider some problems of sequential statistical infer- 
ence. Section 2 introduces some fundamental ideas, Sections 3 and 4 deal 
with sequential (parametric) poiht and interval estimation, while Sections 5, 
6, and 7 examine the sequential probability ratio test of Wald. 


14.2 SOME FUNDAMENTAL IDEAS OF SEQUENTIAL SAMPLING 


In all the problems of statistical inference considered so far, we have assumed 
the availability of a sample of a certain size п. This ignores the fact that 
sampling is expensive and the taking of each observation involves some dost. 
The question then arises as to what should be the minimum size required to 
make a decision. Let us consider some examples. " зиз 


soo 


\ 
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Example 1. In estimating the mean y in sampling from a normal population, 
the sample mean X is an unbiased estimate. Let us suppose that each obser- 
vation costs $A (А > 0), and that we wish to minimize the sizen to make 
the average loss, 


(1) E|X — Ц? + An, 
minimum. Ме have | | 
2 
MN e ior 

(2) L(n, u, o°) = E|X А + Ап ун пА, 
which is minimized if we take п to be a solution of 

СО ор ti о? Y 

empor tT 
that is, п = mo, where 

i : TEL 

(3) 1 п = VA" 


Here we considered п as a continuous variable. It is easy to show that 
п = ny minimizes L and that the minimum value of L is 


(4). » dona Ly = 20V A = 2AM. 


If о is known, one takes по observations (n = [o] / A] + 1) and estimates 
u by X; in doing so, опе minimizes the average loss, as defined in (1): In 
practice, however, g will not be known and one cannot compute ro. How 
should one proceed in view of the ignorance about с? t 


Example2. Let us suppose that we wish to test Ho: u = uo against 

“Hy: и = шу n >ш) in sampling from a normal population with mean ди 
(unknown) and known variance g“. We would like to make a decision on 
the basis of the smallest sample size л. Suppose that the probabilities of the 
two types of error have to be fixed at a and 8. We know that the MP test of 
Hp rejects Hy if X > c. Thus By Shes 


ОТ 

so that 

(6) 

Soy" idw o | T à 
Xu йоз 2H 


i bur 
(7) B= Pits с} = ҖЕ < Al, 
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so that 
(8) Мп OTEL д = ip 
From (6) and (8) we get 
(9) lior d ez, + 29. 
(uy — Ho) 


Thus, if we take n = ng observations, where лу is given by (9), and reject 
Hy if 


X > pot Za 7 , 
the test will have the desired probabilities of the two types of error. In 


practice, however, с will not be known and one cannot compute пу. How 
does one proceed in view of the ignorance of g? 

Example 3... Let Xj, X» :, X, be a sample from Jp, а?), where both [л 
and.g? are unknown. The shortest confidence interval of level 1 — а based 
on X for the parameter и is 


(x = "D tarts Yo SA hub ; 


where S? is the sample variance. This interval has length 2(S/Vn)tn-1, a72 
which is a random variable. In practice, we would like to find a fixed width 
confidence interval with a given confidence level based on the minimum 
number of observations. If с were known, the 1 — а level confidence 
interval (X — (0/4/ п) zz; X + (o| 4/n) Zq/2) has length (2¢/4/7) z, 2. This 
length is at most 2d, say, where d(> 0) is fixed in advance, provided that 
we choose n such that 


а. 
ал 


ог ay} 
ea s 
(10) na gue 


If g is not known, one does not know this minimum sample size. When о is 
unknown, how does one sample to minimize the sample size? 


In the examples above we see that it is not always possible to minimize 
the number of observations required to arrive at a decision that is optimum 
in some sense. Note that all this discussion is based on the fact that we take 
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a sample of a certain size и which is fixed in advance and may or may not 
be the smallest number required. An alternative procedure suggests itself: 
Why not take observations sequentially} that is, one at a time, and use the 
information provided by the observations to date to determine whether a 
further observation is required? In such a case we do not need. to’ fix the 
number of observations in advance of the experiment. 

In this section we consider some'basic- notions of sequential statistical 
inference. Given an infinite sequence X = (Xj, Xo, ---) of rv's, the statistician 
faces the problem of providing a set of rules that. tells the experimenter 
when to stop sampling. Once the sampling is stopped after taking, say, п 
observations, the decision problem is treated as a fixed sample size problem. 

Let Ө be the parameter space,:and,.</, the space of actions open to the 
Statistician. We assume that the rv's Xy Xo, =- that are observed sequentially 
are iid. Тег у(х) be the common pdf (pmf) of the X/'s. y ; 


Definition 1. A sequential decision procedure has two components. 

(a) A.sfopping rule that specifies-whether an element of «у should be 
chosen without taking any observation: If at least one observation 
is taken, the rule specifies, for every set of observed values 
(Xis Xo +, Х„),п > 1, whether to stop sampling and choose a decision 
їп .о/ ог їо take another observation Mri: 

(b) A decision rule that specifies the decision d; € «/ if no observations 
are taken, or, if at least one observation is taken,:specifies the action 

Ё (ху, хә. 7 Xn) E.X to Бе taken Гог each set of observed values 

vi Qs t ху) after which the sampling might be stopped. 


_ ai We will assume that the statistician takes at least one observation before 
making a decision, Itis conyenient todefine stopping regions: Ry, n — 1,2, 5 


Definition 2. Let R, € 4,, n = 1,2, bea sequence of Borel-mcasurable 


sets such that the sampling is. terminated after observing X, = xj, X; = ху, 
at = х, if (xy, хо, +, x,) € Rye If (x, xo «+, х„) € R,, another obser- 
vation x„+; is taken. The Sets Ry, п = 1, 2, ---, are called stopping regions. 


Definition 3. With every sequential stopping rule we associate a stopping 

rv М, which takes on the values 1, 2, 3, -+ N is the (random) total number 

of observations taken before sampling is stopped. : ‘ Ы? 
i o à í ї 

Let {N = n) denote the event that sampling is stopped after observing 

n values xy, хо, 5, x,,and mot before. Thus (N — 13-—-Rj; and i 


is Ti nac 0 тое 


"Tor «heit Ч vase seo 
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{М = п} = {(ху, xo s х„) € 2,: sampling is stopped after observing 
x, and not before} 
= (Ri U Rp U © U Ra) ПА, 

(п) = Ry n RT NR үп Ry 
Note that (N = n}, and the event {N < n] = Ur, (N = H depend only 
on the observations Xj, Xz ---, X,, and not on Xp Xpt 

In what follows, we will Ud only closed sequential Берш ргосе- 
dures, that is, procedures for which sampling ТЕРУ; terminates with 
probability 1. Thus we will assume that ` 


(12) ў Р{М < o) = 1 
ог 
Р{М = c) = 1 – P(N < o] = 0. 


Example 4.. In Example 1, suppose that g is not known. Consider the sẹ- 
quential stopping rule Tp: 
Stop after taking mS 2 observations if n is the smallest integer 
such that n > $,/4/ A, where 


D fo cs Ex 
52 =i RENG ye S EER 
: n-1 n 


Clearly T, is a аі stopping rule that says: After taking each ob- 
servation, compute the corresponding s; ? and check to see whether i > s;/ y А. 
If so, stop; otherwise, take another observation x;+ı If N = n, that is, 
sampling stops after observing x,, then estimate и unbiasedly by x = Уу x;/n. 
Here 

{N =n} ={2< Ani wn l nz zx) 
Example 5. Suppose that we wish to estimate the probability 0, 0 € 0 € 1, 
of obtaining heads when a given coin is tossed. In fixed sample size 
procedures of estimation that we have studied, one tosses a coin n times 
(n fixed in advance) and counts the number of heads. Then the proportion 
of heads in л trials is a reasonable (unbiased and consistent) estimate of 0. 
The larger the sample size п, the smaller is the variance of this estimate. 

If, instead, we toss the coin until it shows the first head, the procedure is 
clearly a sequential decision procedure. Let N be the number of trials needed 
for the first head. Then 


P{N=n+1}=00 = 6), “m= 0, 1, 2,--, 
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and since 
P(N < c) = 500 3 gy xz, 
we note that the sampling terminates with probability 1 (eventually) and 
the sampling procedure is a.closed one. Also, 
EN = EC + 1) 00 — 0)" 
=1 +X 01 0) E < oo for0 <@<1. 


The MLE of 6 is 
T(N) = N 


We may write the sequential procedure in terms of each observation as 
follows. Let X, = 1 if a` head shows up on the ith tossing, and = 0 other- 
wise. Then we stop sampling after observing X,+ if D p-r% = 0 and 
mn = 1. The decision rule is to estimate б. by the reciprocal of the number 
“of obseryations needed to obtain the first head. 

Мое that i 


& ETN) = > Ba = бу 


d Bee) 
№ Td Arii 


0 «61, 


winning a same is p, and: B's chan ) 0 <р< 1 f 
the loser of each game pays the. winner $1 and that, A sta capital: 


Я that i is, when B (or A has all the e money a+b). 
Бат : ee 
А - doin of A's ruin when hé has $x a <x <a + pe 


At Ше next game һе either has $ (2+ 1), if he wins, ог$(х— »i if he ire 
Assuming independence, we have 


(14) Ps = puc ae Pe 
The boundary conditions are : 
(15) Po=1, роь 0. 


| EL vow Compute the minimum risk in each case. ers 


“ FUNDAMENTAL IDEAS. 595 


It is easy to show that the difference equation (14) subject to (15) has a 
unique solution: 
5 a+b х 
[d DIEI" — (1 -DP рут р 
(16) p, = pipr^* —1 
e iffp-i-cpold 


(See Problem 4.) Using an analogous argument for'B's ruin; we can verify 
that the series of games terminates with probability 1, that is, the prob- 
ability of A’s ruin. plus.that of B's ruin is 1, so the procedure is a. closed 


one. In particular, 
Га — ppl" – [0 — Dipl + 
а р + 1, 
PIU ICT ES й 
OPER 
a+b i i е 
The expected duration of the game is given by 


+6) [A ppl 
(8) |... EN = ү T» (= n 1-[0 = ЮРУ" di 
b, 


(17) 
if p = 4. 


p=}. 
(See Problem 5.) We ‘will. derive: (17) and (18). by a. different method in 
Example 14.6.6. i 


We conclude this section with some sematks concerning the precision of 
a sequential procedure. Let: L(0, d(X)) be the loss incurred when we make 
decision; dX). апа 0 is. the’ true, parameter value. In sequential sampling 1 
another | loss : enters, namely, the cost. c(N) of taking . N. observations, The 
statistician’: problem thetefore is to choose a stopping rule апа а decision. 
rule such that: the average loss is minimum, or the.average loss and cost are 
or, ‘simply, the cost is minimum, ог. the average loss is bounded, 
hus, i in the- problem of estimating a parameter, the statistician 
may, be intérestéd; for example, in an. unbiased estimate of. 0 such that the 
тї "loss (0, ax) = vtr ар: is minimum, on the. average. . 


PROBLEMS. 142. -. 
D ‘in Example 1 take the loss tot b as follows: 
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2. In Example 5 suppose that we take observations until we observe the rth head. 
Is this sampling procedure a closed sequential procedure? Let N be the number of 
trials required to stop. Compute the pmf of N; compute EN. Show that r/N is 
the MLE estimate of б. Find an unbiased estimate of 0. 
3. Let X, Xp: be observations on a Poisson rv with parameter А. It is required 
to estimate A sequentially. The stopping rule is to stop at the first N=n for which 
т X; > 1. Let Nbe the number of observations required. Is this a closed 
sequential procedure? Find E; М. If the stopping rule is changed as follows: Stop 
atthe first Nn for which X. X, = 2, find the distribution of N. Also, find E,N. 


4. Prove (16) by writing (p +4) Px = PP sii + ЧФРх-ї+ where q—1—p. Thus 
Pea Px = e (р, — Px-1): 


5. Prove (18) by an argument similar to the one used in Problem 4. (Hint: If E, 
is the expected duration of the game, starting with a capital of x, then 

Е, = 1 +рЕд tin х= Ь% з М-1. 
Clearly Ey = E,+» = 0.) 


14.3 SEQUENTIAL UNBIASED ESTIMATION 


In this section we consider some simple problems of sequential estimation. 
The subject of sequential estimation is not yet very well developed, and, the 
atea is under active investigation at this time. Basically no new methods of 
estimation are needed. One takes observations sequentially and stops accord- 
ing to some’ well-defined sequential procedure that is a function of the ob- 
servations to date. Then опе estimates the parameter on the basis of the 
sample thus far obtained by using some standard estimation procedure like 
the method of unbiased estimation: or the: method of maximum likelihood 
estimation. What makes the problem difficult is the fact that the sample space 
changes with each observation, and one has to solve the distribution problem 
resulting from the introduction of the stopping variable. We have already 
considered such a problem in Example 14.2.5, as well as in Example 14.2.1, 
which we will discuss in some detail in the next section. We first derive some 
results of general interest concerning the mean and variance of the stopping 
rv and find an analogue of the Fréchet-Cramér-Rao lower bound for the 
variance of a sequential estimate. ` 
Let X}, Хь --- Беа sequence of independent rV's with common pdf (pmf) 
ух), 0 є Ө. Suppose that we wish to estimate 0 so that the risk E,L(0, d(X)) 
= E(0 — d(X)), say, is minimum. If we restrict attention to unbiased esti- 
mates of 0, it would be of interest to find a lower bound for the variance of 
such estimates. In other words an extension of the Fréchet-Cramér-Rao lower 


UNBIASED ESTIMATION ——— t 597 


bound to the sequential unbiased estimation case is sought. To obtain just such 
an extension we need some preliminary results. 


Theorem 1, (Wald's Equation). Let Xj, X2, --- be iid ry’s with Е|Х\| < о. 
Let N be a stopping variable, and write Sy = 2E X,. If EN « oo, then 
(1) i E{Sy} = EX;EN. 


Proof. Define rv's 


1 if no decision is reached up to the (i — 1)th stage, 
that is, if N > (i — 1), < 
0 otherwise. 


@ у= 


Then Y; is a function of Xj, Х, =, X;., alone and is independent of X;. 
Consider the rv Уу” , X,Y, Clearly, 


Sy = X X,Y, 
n=l 
Thus 
(3) ESy = E( Ex Y) 
Now 


X gr] = Z Е|ХД]Е|Ү 
- E|X È PLN >п} 
= £|X| X E PUN = 4) 
= E[xi| zin P(N = n) 
= E|X,| EN < co. 


We\may therefore interchange the expectation and summation signs in(3), - 
obtaining 


ESy = Ў EX,EY; 
2 n=1 
= EX, È P(N >п} 
Dp 
- EX, EN, 


as asserted. 


Remark 1. This result is due to Wald [133]. The proof given above is due 
to Johnson [53]. An analysis of the proof shows that the rv's X, do not 
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need to have the same distribution, provided we assume that ЕХ, = ци for | 
all n, E|X,| < А < oo for ali n, and EN < co. | 


Remark 2. If the rv’s Xy, Xz --- are iid and are independent of the iid rv's 4 
Y, Y» :--, andif M denotes the random number of X’s, and N, the random 
number of Y's, then | 


Е|Х| «o,  E|Y|« o, ЕМ < о, — EN « co 
imply * : 
M N . 
(4) ЕЎ + Y) = EM EX + EN EY. 
© ve 
We leave the reader to complete the proof of (4). 


Theorem 2. ш X, DOS - be a sequence of iid rv's with common mean 0 
and variance c^, 0 < o? < оо. For any sequential stopping rule with 
| EN < co, if ЕСУ", [2,3 < оо then 


(5) 


6 


Since 


EE xm Exam nz ae (Ein 


we may interchange the summation and expectation sigas dn e We have 
from Theorem 1 


дд mz Е(Ў х3) x 
= EX; D) 
= о? ЕМ. 
Now 


со 1-1 
E(X XXX) - 22 Y Eaux) 
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SE(YE(XX|Y)) (Problem 47.5) 


2Y 
233 ] Е[Ү, EX, E(X;|Y,)) (Хап Y, are in- 
EIS dependent of X;) 


MIT 


KE 
and the proof is complete. 
Theorem 3 (Wolfowitz [143]. Let X = (X, Xs, ---) be an infinite sequence 


of iid rv's with common pmf (pdf) f(x), 0 € Ө. Suppose that the following 
regularity conditions hold : 


(). Ө is.an open interval of the real line which may consist of the entire 
line or of an entire half-line. 

(ii) fax) is differentiable with respect to all € Ө for all x. Define 0 log f(x) 
[00 = 0 whenever fix) = 0, so that д log /]д0 is defined for all 0 in 
Ө and all x. 


(iii) The integral (the sum) f /Хх) dx ( LE. AG e cart be differentiated under 
the integral (summation) sign. 


(iv) 0< 5219/0207 < о for all 0. 


лорд |f < ог 


Let q(0) be an estimable function that is differentiable on Ө. Let 
h(X,, Xo, ---) be an unbiased estimate of (0) such that 


(v) For eachn = 1; 2, --: and all 0, 


Е, 


(vi) the equation Р 
Ejh(X, X» TOF 4) m (8) 


can be differentiated under the expectation sign with respect to all 0 
in Ө and for each n = 1, 2, ---, and 


(vii) the series 2 dp, (0d converges uniformly on 6, where 


f щхь хь 29 Д Gn) Ф) 
(N=n) ^ 


ser A fene Xp ход ЛО). 


Then ы; 
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{е'(8)}? 
(7) vara {h(X1, X» ++, Xy)} > EAN) E, log 70090): 


for all.0, provided that E,N < со. 
Remark 3. Conditions (i) to (iv) are the same as those assumed in Theorem 
8.5.1. Conditions (v), (vi), and (vii) are needed because of the sequential 
nature of the procedure. For (vi) to hold, what we need is the existence, 
for each integer я, of a nonnegative statistic T,( X;, +: X,) such that 

à [his Xs x) dy I Лә) < TQ, Xo, тетә X5) 


чй, for all бєӨ and all (ху, x», =, x,) in {N = п) and that fu, Taxi 
"y pee Пах [ог the sum Ума: Т/х), 777, х„)] is finite. 


Proof of the Theorem. Let N be the stopping variable ‘associated with the 
sequential procedure. The rv's (0/00) log fo(X;), i=1, 2, ---, are iid with 


&| 2 log Г] =0 . foralli, 


0< DEA log Jo] <0. 
$се EN < co, Wald's equation implies that 
Dp аге) = EN Е 0 log Ху] =0 . (огай. 


From condition (v) and Theorem 2 we obtain 


s [f 2500.5). sors оеду 
Now 


cov| A(X ж, да n $ Mogf a 


= E, {Жл Xs. X – 0] bi Ao) 


< (EAX, = хо) — 9 OF)? (Eo | 205/400) 70" 


for all 6. But by conditions (vi) and (vii) 
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КЛ = dj BO FA Xy) 

ай % д log (ХЭ 
ына, Xy sy) E элю 

and the result follows, 8 


Remark 4. In the case of a fixed sample size procedure, if there exists ап 
estimate that attains the lower bound of the Fréchet-Cramér-Rao inequality 
nothing is gained by proceeding sequentially. If we restrict attention to 
sequential estimation procedures for which EN < ту, and the regularity 
conditions of Theorem 3 hold, then, for every unbiased estimate T(X)o/a, 


var (T) = 


PN ЕРЕ 


Example 1. In Example 14.2.5, conditions (i) through (v) of Theorem 3 are 
satisfied. Thus for any unbiased estimate A(X;, Xo, +, Xy) of 0, we have 


E Sera Ue (2-Е 1 
varo {h(X1, X», +++, Ху)) > Ej log RODIO 0є(0, 1) 


Since A(X) = 0*(1 — 0)^*, 8 e (о, 1), 


д 10р AX) .X—0 
00 — 1-6) 
and j 
alog AX) T. 01—0 . 1]. — 
zi E T= aco" gen 
so that 


vat, {MXi Xo, = Xy) = PU — 0) 
for any unbiased estimate h of ? satisfying conditions (vi) and (vii) of 
Theorem 3. 


PROBLEMS 14.3 


1... Prove result (4), stated in Remark 2. 

2. In Problem 14.2.2 find the lower bound for the variance of an unbiased 
estimate of б. З 

3. If X, and Y; аге independent of X;, verify that E{X;X;|¥;) = EX, Е(ХДҮ,). 
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14.4. SEQUENTIAL ESTIMATION OF THE MEAN OF A NORMAL 
POPULATION 


` Let Xy, Xo, ---, Xp be iid normal rv's with mean и and variance c^, both 
unknown. In this section we consider the sequential point and interval 
estimation of #. We begin with the sequential unbiased estimation procedure 
suggested by Robbins [100]. As an estimate of и we choose X,, the sample 
mean. The problem now is to choose л. Let us assume that the loss incurred 
is A|X, = y|, where A > 0 is a known constant, and let each observation 
cost one unit. Then we wish to choose n to minimize 


а) sun EL(n) = E{A|X, — u| + n). 
_ We have 
Tuc ды afa 
| reete ы 
so that 
Bar OE 
@у у, EU) = AE| ут и ae 


EU то a 

Vz "WEE 

Regarding n as а continuous variable, we differentiate EL(n) and set the 
derivative equal to 0 to get 


as the value of n that minimizes (2). For this value of n (which may not be 
integral), we have 


(4) (с) = EL(ny) = „Жл — Bg i Cu 
= Зи, 


so that the loss due to the error of estimate is twice the size of the sample, 
that is, twice the cost of sampling. Of course, this presupposes knowledge of 
a. If we do not know c, we cannot compute л. 

For the case where о is unknown, Robbins [100] suggested the following 
sequential sampling procedure R: 


í Sample sequentially, observing x, xs, ---. Compute x, = п-! Xx; and 
5, = (n— 1) EL, G; — 3) successively, and stop sampling the first 
(5) 4 time for n > 2 when : 


As, V8 
"2 (Ж) А 1 


; 
i 
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We may rewrite this inequality as 
л 
@ EG - ыў s 2 (т-у 


To study the properties of the stopping rule R, we will need some preliminary 
lemmas. First we note that the sampling procedure is closed, 


Lemma 1. Rule R terminates with probability 1. 


This result follows from the inequality in (5), since the right-hand side 
converges to a constant ny with probability 1 (see Theorem 6.4.7). 


Lemma 2. For any fixed n, X,, is independent of the set of rv's S2, S, ..., 
S?, and hence 


qu. Pa 


; 3 L5 
3r {Ж 555 = {== е "д. 


Proof. Define 


Then U; are iid ,4/(0,1) rv's. Let us write 


i po аш rip, ПУН] 
апа 
= ут, 
where й = n! У)" u; Then the rv's Y/s are also A(0, 1) and independent. 
Now 
у= 8А ут 


and 


It follows that Y, is “independent of S? for i=2, ---, n. This is the same as 
saying that X,; is independent of 52, 55, 82, 


In fact, we have also shown that the following result holds. 


Lemma 3. The joint distribution of the sequence of rv's (S7), п = 2,3, =, 
is the same as that of the sequence _ 
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2 К 
GET OD + 1L) 


where the Y;'s are iid (0, 1). 


Lemma 4. Let R* be any sequential stopping rule that terminates with 
probability 1, and suppose that the sample size N is determined uniquely 
in such а way that № —n if and only if the point (3, Ф ЕЯ з) 
belongs to some set $* , © #,-;. Then, for any n = 2, 3, ---, 


(8) Ay weil м nh о), 
where " 
1 © -ёт 
Nae dx, 
n V 2x f SY 
hence the unconditional distribution of у (Хк — p)/a is also (0,1). 
Proof. The proof is left as an exercise. 
Lemma 5. For the sampling procedure R the probability distribution of N 
may be obtained as follows. Let {¥,}, n = 1, 2, ---, be iid (0, 1) rv's, and 
let - 
(9) N=n<en > 2 is the first integer such that 
~ py? 22 
Yi Y + + ү 5 е. nor 


пб! . | R 1 : 
Lemma 5 follows from Lemma 3 and definitions of ny, Ys, and so forth. 
Let us now compute the average loss for R. We have 


+N, 


MCN de. 


so that | 2 
қо) = Er LIN) = Ў P(N = п) E{L(N)|N = п) 
_¢ Ж bo iw : 
Ep (45/27 + n) by Lemma 4 
a hfe S 
- 4/2 б Е(М?Ї?у+ EN 
(11) = 21° E(N 17) + ЕМ. 
The risk for the procedure R, given by (11), is to be compared with the 
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risk for the fixed sample size procedure, given by (4). One needs therefore 
the distribution of N, which is not easy to compute. Ideally, one would like 
EN to be close to mp, whatever g? may be. In fact, at least for large ло, one 
can show that 


PIN < т} > 
so that for at least half the time N is no larger than лу. Note that, for any 
fixed integer л, 

3 
P{N < n} > Руз Toy ep 
/ 0 


It follows that, for n = лу, 


PÍN < n} = РҮ; + oh а (по 1) 
= P(X'(ny — 1) x ny 1) 
(12) a P{Z <0} =}, 
where Z~ (0, 1). 
Robbins presents some numerical results comparing ngand v(g) with values 
obtained by taking, for each лу, a thousand sequences {y,} of normal (0, 1) 
rv's and computing the average values E*N and E*N ws 


No: 4 16 64 

Е* М: 3.66 15.49 63.84 
ЕГ: 12.49 49.92 193.89 
Ex, L( 3): 12.00 -48.00 192.00 
Regret = ESL — EpL: .49 1.92 1.89 


These. results have been substantiated: in. subsequent, studies by many 
authors, and the procedure R has been shown to have the following limiting 


properties. 


(a) P(lim, по! N= 1} = 1. 

(b) For w >70, lim, „(Мт = 1. 

(c) Af %(0) = %0)/%), then lim, 72) = T 

(d) If w(z) = (0) vo) is the regret, then w(v) is bounded азо — co. 
We will not prove these results here, but refer the reader to Starr and 
Woodroofe [124] for further references. We note that, at least for large- 
sample values (g — со), the sample size required by R will be close to no, the 
average sample number is close to mo, and the ratio of average loss using R to 
average loss using a fixed sample size procedure (when c is known) tends 
to 1. Moreover, the regret, that is, the average loss due to ignorance of , 
а, is bounded, at least for large-sample values. 

We next consider the sequential interval estimation of the mean и, Let X; 
Xp, «+ be independent M(x, 0°) гуз, and suppose that we require а con- 
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fidence interval for 4 of width at most 2d (d > 0) at confidence level 1 = œ 
(0 x a 1). We have seen (Example 14.2.3) that, if c is known, a confidence 
interval J, = (X, — d, X, + d)for p of width 24 and at level 1 — o is assured, 
provided that we choose n to be the (smallest) integer satisfying 
(13) nz. 

If о is unknown, no procedure based on a fixed number of observations 
n will satisfy our requirements (see Dantzig [21]. In this situation two ) 
choices are open to us. The first option is to take a preliminary sample of | ` 
size m and estimate g”. This procedure (due to Stein [125]), which leads to a 
sample size that approximately satisfies (13), is simple enough. Take a first 
sample of size m (fixed) and compute 


Kym" EX, ado 52 = (т 1) Ex - 1. 


Then take a further sample of size N.— m, where 


2 
(14) N is the smallest integer > E Iia 


and use as confidence interval 
Sm 
(15) (& TAN neam An + гал) 


where Xy = N^! 37. X; The length of this confidence interval is Qi ese 
“Snl A/N › so that, if we choose N to be the smallest integer satisfying 


- bs s? 
(16) І N> UF арма 


the confidence interval will have width < 2d; If N < m, the confidence 
interval based on the first sample itself will have width 5 24. 

We need to show that. confidence interval (15) has. the required level 
(1:— а). Note that N is uow.a random yariable. It will be sufficient to show 
that VN (Xy — 1015, has а t-distribution with m — 1 d.f., for then 


РР (t - ix т-а <u < y+ x ioa] 
Вегас AS 
hor Pref S £ < ал) 


=l-a 3 
To see that /N(Xy = 5)JS,, has'a (т — 1) distribution, we write 
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Bx, By 
Xy ER 


17) күт, WA y. 

(17) Aot OX 

Now. $„ is independent of Ew and since N depends only on Sm, the condi- 
tional distribution of Xy, given N = k, is Nts o’/k). It follows that 
(Xy — и) WN, conditional on N = К, is (0, g °} and hence the uncondi- 
падар distribution of, (Xy — WN is also 4 н a”). Consequently, i 


Uo En = DV No _ ch VN 
$ БЛ 
has a (т xt 1) distribution, as asserted. 
duse for the pmf of N, we have 


PIN m= P {т> Эг) 


52 -pn d 
| PFs "бг у =} 
ab x З = Рт - ps Mee 


Also, for k > т, 
Rupee Dt 5 ; 52: 
АМ K) = Реан GS} 
в $ 2:1); 4 
den Je 0 4 e ym s APD Mn)S), 
Ua 4 1л-1а/2 
so that the pni of N depends only on g: Also, 


PN< b xov (саи) неў 


oe a 


-12/2 


" Аа 
im Pm = 1) < E 
+ rp Ep mend) 


а SO Lom 
(20у = 


608 
Moreover, 


EN = 


‘Let 


Then 


о) 


| 
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трт - 1) < "= ae È kPIN= k}. 
miaz O 
— mm — 1) d 
job HES eje baw 
4 Е И. ре ase дый, — 0/2) рир 1002)-1 
ЕМ = jm-»2 Waren i те > es 


eo 


к=т+ 


It follows ae 


у=, = 1)/2] tf 
со ce mx „(\/®ауоя-3)/2 d) 


pg 02x y. 


kml J Uk-D/mly У 


«EN 


Thus 


Se кы,” 


19 CG—0D/m- 


(m-3)/2 ах 


е 02x x (m-3)/2 dx 


c» — (k/my E 2 
Sd fi (= + l)e Q/2x x (m-3)/2 ax}. 
k=m+1 J (k-D/myN У 


1 У. —/2s 
DA Fs — 1)2] t. Pu 


Qm 1/2 


+ е е2: т-р 


< EN 


which we 


< тот Hs EA (|! i 


x09 dx 


s an 
De E 


= (1/2) c (m-3)/2 
ү dx. 


+ fre e 0/0 eie iin ж )а б 


rewrite as 


mP(x'(m — 1) «y)4 a 


ab Рут + 1) > ») 


2 
Q2) «EN < mP(y'(m — 1) < y) +e tan Phx en + 1) > у} 


+ P{y(m — 1) > y}; 


pu Кет xtm-1/27-1 а). 
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If по is the number of observations required for a fixed sample size procedure 
(when c is known), : 


(23) EN « m t 1 4 n, 


at least when m is large enough so that і, aja Li 


Remark 1. The basic difficulty of Stein's approach is that there is no control. 
on N. If c is large, N may be very large. Moreover, we lose a lot of infor- 
mation in the estimation of a^, since we use only the first sample for this 
purpose. Another problem is the determination of т: how large or small 
should it be? Seelbinder [112] and Moshman [83] have considered this prob- 
lem, and we leave the interested reader to pursue the subject. 


Stein's procedure can also be.used to test the hypothesis A X цо against 
u > цо Whenever g is not known. We" will not go into the procedure here’ 
but refer the reader to Stein [125]. As a by-product of tHe procedure we obtain 
the following result. f ; 


Theorem 1. Xy is an unbiased estimate of и with variance < [m/(m — 2)] 
(4° |t ap) form > 2. 
Proof. We have seen that 

V at 2 

PALACE x РВА ЮРА Ж eg 

9 с 
are independent rv's with (0, 1) and (т — 1) distributions, respectively, 
It follows that ./N (Xy — 4)/S,, ~ t(m — 1). Now 


р N(x — 
E, Aly =) e E, [ENG Dota 


the p зс eL) 


since N is a function of S, alone, and ./N[(Xy — ule] E independent ofo 
Sm: It follows that E, (Xy — и). = 0, provided that EN^!? < оо, More- 
over, Г 


< N(Xy—g) s * 
E, Xy — uy- Epo pr Sa) i 
Му = ur 4 usin, 
à e) в (14), 
< Epo { s Pg 26 } 
" әйе ЗЕКЕТ; 907 
иа OLD m 


‘m—1, 0/2 
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We leave the reader to show that ЕМ71 < со by repeating the argument 
used in obtaining (22). 


"Finally, we investigate briefly the second option, namely, to sample цат 
tially and use all the information in the sample to date to estimate c*. The 
following sequential procedure, / has received wide attention. Proceed se- 
quentially, and at each stage compute 


X,=n Xp) and Si-(n-1y! Èx- 
ñl i=l 


Stop with observation Ху, where 

(24) N = smallest integer n > mp such that 
7 ў 4 

55 Sn- e: 

"i ya 

where mp > 2 is a fixed integer. Then form the interval Iy: = (Xy — E 
Ху + d). The interval Iy is of fixed length 2d, and, it is hoped, has level 1 — 

We first note that by the strong law 5255 25 7°, so that 


(25) "Pa(N < o) = 1 —P(N = 0} 
52 а 
=1- Р{ 5, > forall n > no} 
n n—-la/2 
sechs 


Us, 
Thus the sampling terminates with probability 1. If we write 
In = Bu = naso Sul s X, + 5-372 Syl”), 4 
then, for fixed bs c, and n, 


(Q9 . P, Jna u} = Pao т^ eal < лал) =1- а. 


ЕМ =n, then 


а ыы 
К; E m 4 
. and i follows that i 2 a. If we can use (26) to show that LP, gu 3 ny а 
` 1 — q, Iy will have the required: confidence level. Since $2** g^, and since 

1л-1,а/2 > 4/2 aS N> 00, at least. for large N, 


so that EN > z2;, (02/4); where 22 (07142) is the smallest sample size re- 
quired if о is known. 
_... Starr [123] has shown that 
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(а) EN) < o for all o > 0 and d > 0, 

(b) lim, P{Iy 34} -1—a, 

(c) іт, (E, Nle) 2.1, 0 < с < co, 

(9): lim; (М/с) = 1 a.s., 0 <0 « oo, 
where c = smallest integer л such that n > 22, (02/2). 

Starr also presents some numerical results which indicate that 

P(|Xy — d <d} > 1- a for all u, c? 
is nearly achieved. Result (b) has been proved by Chow and Robbins [13] 
for any ry X with continuous distribution. Simons [117] has also studied the 
sequential procedure and shows that 
EN S e+ mhl foralld = 0, ø > 0, 
provided that we replace t,—1,¢ 2. by Zaza in the definition of -N, Moreover, |... 
he shows that the cost of not knowing ø is a finite number of observations 
in that, with 1,1, х2 replaced Бу 2,2 in the definition of Nand taking лу > 3, 
we have, for some finite integer k > 0, j 
P(|Xy | <d} >= 1—9 for all y, o°, d 
and 
EN + k) €c- nj * k ` forall y, o^, d. j 

We will not pursue. this subject any further. The mathematics involved is 
beyond the scope of this book, and we intended only to give an idea of the , 
type of results obtained. The procedure, in any case, is quite simple and clear. 
For further details and proofs we refer to the papers cited above and to the 
book of Zacks [146], Chapter 10. 


Remark 2. As we remarked earlier, the subject of sequential estimation is 
under active inyestigation. Ideas similar to the one used by Robbins [100] 
have been employed by several authors in the case of sampling from specific: 
nonnormal populations. We refer, in particular, to the papers by Richter 
[99], Ghurye and. Robbins [35], Simons and Zacks [116], and Zacks [145]. 
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1. Inthe problem of sequential estimation of the mean of a normal distribution ` 
unkn atiance:v*, let us consider the following alternative loss functions: .... 


5° 


z у im = [— up +n, where s > 0; 
(D Ци) = AIX — pit tlog л, where s > 0. ; 
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In each case define a sequential sampling procedure analogous to (5). Show that 
Lemmas 1 through 5 hold. Also, compute the average loss for the sequential 
procedure. 

2. Let (X, Yı), (Xo Y3), :--, be independent observations on a bivariate normal 
rv (X, Y) with mean vector (;4, и), and suppose that observations are taken 
sequentially. How will you estimate fa = us? 


3. Prove Lemma 4. 


4 ; D 
145 THE SEQUENTIAL PROBABILITY RATIO TEST 


Let X, Xj, X», «+ be a sequence of iid rv's with common pdf (pmf) fj(x)." 
In this section we study n sequential probability ratio test for testing а 
simple hypothesis, Ho: X ~ fj, against a simple alternative, Hi: X ~ fy, 
when the observations are SK sequentially. 
Let Jon and fip denote the joint pdf's (pmf's) of Xi, Xz ---, X, under Но 
and H;, respectively. Then 


fitv Xo s x) = П j =o. 


Write 

VEM AY Find) 
e AQ Xo n Xp) = y ОР 
where loisi х = (XXa ns xy). 


Definition 1. The sequential “probability ratio test (SPRT) for testing Ho 
against H, is a rule that states: 
(i) if at any stage of the sampling 


Q) Ax) > 4, 
stop and reject Ну; ' d 
LC quide vies ib ‹ br i 
(З)лдиЯ vd Жш Si t be мю в. 
. stop and гр Hy (reject Ну); and» 
(ш) if 
(4) В < Ах) < A, 


continue sampling by taking another Observation Xn+t 


Heieot and! B(4:»38) asi санае db taanicied so shania DD 
will have strength (a, B), where ү 


(5) a = Р{Туре I error} = P(Reject нн}, 
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(6) * B= P{Type IF error] = P(Accept Hymn). "^ 
If N is the stopping гу, then 
а = РАХ) > А}, B = Py, {Ay(X) < В}. 


Кета. The SPRT defined by (2), (3), and (4) is due to. Wald [134] and 
is the best procedure in a certain sense to be explained later. The test is 
suggested by the Neyman-Pearson fundamental lemma. Here both œ and 
В are fixed at a preassigned level, but sample size is not fixed in advance. 
Note that the assumption iof- independence. and. identical distribution on 
the. rv's.X, is not necessary 10 define the test; it is merely a simplifying 
assumption. |) 


Let us write 
260) 
ЛХ) ja 


Then the ту? 21, Z;, `- are also iid rv's, and 


(2) КОЛЫ 


@) log ЫХ) = Ë 2, = S, say 


In terms of S, the SPRT (Fig. 1) may be written as follows: 
At each stage compute s, = у” , z,. If 


(9) b= log B < 5, < log A= a, 
continue by taking observation Z,,,,; if 
(10) : а<з„ reject Ho 
and if e 
(11) bzs, reject Hy. 

Y? fn 


Reject Ho 


ута 


Fig. 1 
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Remark 2. Ina sample where fin = 0 = fos: we define the ratio 2, = 1. If, 
for some values xj, X2 *-, Xp fin > O but fo, = 0, the inequality A, > Ais 
considered fulfilled and Hp is rejected. / 

y 
Remark 3. We will ignore the trivial case P(Z; = 0} = 1 in the sequel. In 
this case fj, = fj, and we accept Ho at the first observation or even without 
taking any observation. 


Remark 4: The same discussion applies to this rule: Reject Ho if А, > A, 
teject H; if 4, < B, and continue if В < A, < A; decide on the boundaries 
according to some fixed’ probabilities. If А, has an absolutely continuous 
df, the two rules are clearly equivalent. If А, has a discrete distribution, 
randomization on the boundaries is necessary to achieve a i 
strength (a, B). 


Example 1. Let X; Xo, --- be iid rv’s with a common Bernoulli law with 
parameter p, рєӨ = {ро Pi) po # Py То test Ho: X; ~ (1, Po) against 
Hy: X; ~ b (1, ру), we compute 


ifx = 1, 


M FAUSSE орт Ж х0. 
"MP $ bral ШД 
If К is the number of 1’s in the sequence of first п observations, then 


1.— ру 
1 — P, 


S, = È Z; = Rlog^ + (n = R) log 
11 Po 5 
We obtain partial sums s, cumulatively and accept Ho if 


1-р 
=b, 
pem < log B 


r log Рі + (n - г) log 
à Po 
reject Ho if 
р ^ 1-р 
r log =| - pl - = 
log Р + (n – г) 108 тр; > log А = а, 
and take another observation otherwise. ; 


Example 2. Let X, Xp: ~ (0, 1), where бєӨ = {fo 0,). To test | 
Hy: 0 = 6, against Ну: 0 = 01, we have 


АС а охан - б) 


Z; = log 
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and 
log ЫХ) = È Z; = (0r — bo) È X 5.05 - бу. 


The quantity log À (x) can be computed cumulatively if, after each 
observation x,, we add (0; — 0)x, + 1(02 — 02) to the preceding value of 
log 2,-1(x)- 


Remark 5. It is instructive to explore somewhat the connection between the 
SPRT and random walks. Let us suppose that a particle at point Sp under- . 
goes a jump Z; at time n = 1, a jump Z; at time n = 2, and so on. Let 
21, Zz ++: be iid rv's with a given distribution. The particle then moves along 
the axis, and at time п its position is described by 


л 
S, = 5+ à 2; = Satz. 
1 


In particular, if 21, 22, --- have the common pmf 

P(Z-lje-p  P(Z--1)-4 and 

Р{2;=0}=1-р- Ф 
0<р<1,0<4<1,р+4<1, we say that S, performs a simple random 
walk. In practice, it is usual to take p = 1 — q. A random walk is therefore 
just a sequence of rv's {S,}. The state space, that is, the set of values 
assumed by S,, is continuous if the Z;'s are continuous rv's and discrete if the 
Zs are of the discrete type. We say that the process is in state x at time n if 
S, = x. If the particle continues to move indefinitely according to S, = $,-1 
+ Z, we say that the random walk is unrestricted. The motion of the 
particle is usually restricted in some manner by the presence of one or 
more barriers. For example, a random walk starting at the origin may be 
restricted to within a distance a in the positive direction and b in the nega- 
tive direction from the origin in such a way that, whenever the particle 
touches or crosses the barrier a or — b, it is absorbed there and the motion 
ceases, The states x > a and x < — b are absorbing states, and a and —b 
are known as absorbing barriers. 

An example is the problem of Gambler's Ruin, studied in Example 14.2.6. 
There S, is player B's cumulative gain at the end of n games. Provided that 
—b < S, <a, we have 5, = X4 Z;, where the Z;s are iid with pmf 
P(Z-1)-1-pP(Z- —1} =р,0<р< 1. If at any stage S, = 
a(or — b), B has gained (lost) all of A's (his) capital and A(B) is ruined. 
The problem of interest is the probability of A's ruin, which is given by 


(142.16). i @ Nar 
The connection between a random walk with two absorbing barriers 
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and the SPRT should now be obvious. Let 21, Z} -- be iid гуз, and 
S, = E; Zi- Also, let N be the smallest integer n such that either 


S, = a ( = log A) or 5, < —b(=log В), a,b>0. 


Then N is an ry and can be regarded as the time of absorption for the 
random walk described by S, with absorbing barriers at a and —b. If the 
SPRT terminates with probability 1, then 1 — @ and a are the probabilities 
that absorption happens for the first time when the particle has a position 
S, < — bor S, > a, respectively, given that Hp is true. A similar inter- 
pretation is given to В and 1 — 8. 


Having defined the test, let us now consider the determination of stopping 
bounds. The following result holds. 


Theorem 1. For the SPRT with stopping bounds A and B, A > B, and 
strength (a, 8), we have. 


(12) Os te Bite ge лотов. 


Proof. Let i; 
Ry = (xi Xo, Xy): N = n and Ay(x) > A} 
and ` 
: S, = {(хь Xa c Xn): М = n and Ам(х) < В}. 
Then the {R,} are mutually disjoint, and so also are the sets {Sp}. Also 
we R, means that after taking n observations the SPRT terminates in tht 
rejection of Hy. Thus 
= PLU) R,|H, 
a= PiU Rl Ho} 
= 2 Р, (Rn) 
=È f „з dx if fon isà pdf (a similar argument 
holds in the discrete case) 
-1 = 
«iB eias 


= A71P,, {Reject Hi) 
= A1 =p). 


Note that we haye assumed above (and in what follows) that 


Erw-»-mfj ed4a-L 1-01, 
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that is, the SPRT terminates eventually with probability 1.-We will prove 
this assertion later, Similarly 
= PÍÓ sn, 
B= P(U sum) 
= E fem ax s BE f, dx 
LS n n=) n 
= ВР. {Accept Hy} =B(1 — a). 


Remark 6. Note that a < А701 — B) < A! and 8 < B(1 — a) < B. Thus 
the shaded region in Fig. 2 is the region of values of a and В, given A and В. 


B 


rt = (BVA 
N 


0 VA 1 
Fig. 2 
Inequalities (12) suggest the possibility of approximating the boundaries 
A and B by 


lis Le "n 
(13) 4-icÉ аа pe B 


R 


Let (a^, В') be the true strength of the SPRT with bounds (A', В’). Note 
that since æ and f are usually small A’ > 1 > B' >0. In what follows, we 


will assume that 0 < B « 1 « A. 
Theorem 2. If for the choice 
„йй! эй 


a l-a’ 
the SPRT terminates with probability 1 and the strength is (a’, 6’), then 


(9. wsyp Bs Tho and atp cate 


Proof. The proof is left as an exercise. 
Remark 7. Both a’ and В' cannot éxceed a and В, respectively; at least опе 


А 
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of a’ € a, @' < Bis true. Generally both are true, and the choice of 4’, B' 
Jeads to a more stringent test then the one with stoppping bounds A and 
, B. In fact, since a, В are small, 


a Sq Àg ad D 


and 


p D. s KI a) 


l>a 


so that the increase in the error size in a'(or В) is not beyond a factor 
(1 + 8) [or (1 + а)]. For example, ifa = 8 = .05, then a’ = B' < .05(1.05) 
= 0525. 


Remark 8. А ‘and B' are functions of a, В and can be computed, once and 
for all, free of the pdf (pmf) fy» Note that in the Neyman-Pearson theory the 
critical values depend on fj, and a. This means that no distribution problem 
is involved in the SPRT, except when one is intérested in the distribution 
of N, the number of trials required to stop. 


Remark 9. The only serious risk in using (A‘, B^) is that (a^, 6) may be 
much smaller than required. As a consequence one may need to take a 
large number of observations. Fortunately, however, there is reason to be- 

` lieve that this effect is also moderate because, if we consider the last step 
from 5,1 to S, and suppose, for definiteness, that Ho is rejected at the nth 
observation, then $,., < log A ~ log [(1 — 8)/0] < 5„, and the approxima- 
tion consists in replacing S, by log A’. Thus the error is due to the last 
Observation Z,, so that the excess over log A’ is due to Z, and will usually 
be moderate. 


Remark 10.. Theorems 1 and 2 hold even for dependent observations, pro- 
vided we assume that P(N < co|H;) = 1, i = 0, 1. 


Example 3. In Example 1 above we may write the test procedure as follows: 
Continue sampling by taking another observation as long as 


log B — n log 1-2-2 < rlog 2 — log 122) 


I — po 
«log A — n log 1—2, 
=Po 


Жр, > Po, this is the same as stating: Continue if 


log B + п log (1 — р0)/(1 — pi) - , 

* Tog [2:01 — po)/p« — Pil ep жа өй t 
log A + n log ((1 — ро) — P 

ST Tos Trt = D -p ' 
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stop and reject Hy if 


refed бб fee RER 


and reject H, if 


rs {log Btn log {= 29} fog BOLA Pill = Po) AY. 


Here r is the number of l's in the sequence to date. 
Let a = .1, В.=,.1, and ро = d Pr = 4: Using the approximation in 


Theorem 2, we take 


um bic Bee 98 ps аА 
A И ЕЕ! 9 and BE > 


The SPRT is to continue if 


WA = 109 talog < n lo 3089 - Tad 
2 ry log 9 i 


to reject Ho if 


т> +1 
and to reject H; if 
rs 3 > 
If we observe the sequence 0,'1, 0, 0, then л = 1, п = 4, so that we reject 
Hy sine r=1<$-1=1. 
Let us compare this result with the fixed sample size lure. Note that 
E,R= m=", vn (R) = md — Po) = Tg 
and 
Бук н = 2%, ovn (R) = mall P) = Te 


` Using normal approximations, we have 


.1 = РЕ > у= P{z > tue) 
“ae nsn er(z etc en) 


where Z is (0, 1). From the tables 


and 
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k = (4) _ k= (314) _ _ 
Pe mde aig 129 


giving n/2 = 2.56(4/3n]4) or n ~ 5. At least in this case, we see that there 
is a saving of one observation. 


PROBLEMS 14.5 
1. Find the sequential probability ratio test with stopping bounds B < 1 < A 
of the simple hypothesis Hy: 0 = 6, against H,: 0 = 0, in each of the following 
Cases: 
(a) X ~ P(00 > 0. 
(b X~ (02), и known. 
(с) X ~ С, 1/0), 0 > 0. 
(d) X has pdf f,(x), given by 
Fan if x > 6, 


е` 

СА 
(е) X~ NB(1;0) 0 « 0 < 1. 
2. In Problem 1 use the approximation for A and B from Theorem 2 with 
a = B =.1. Find the SPRT of Hy: 0 = 6, against H;: 0 = ô; in each of the fol- 
lowing cases: 
(а) X ~ PO), 0 >0;6 = 1, 0, = 2. 
(b) X~ Wu), 0 = 9, 6, = 4. 
(с) -X ~ G(1,1/0), б, =1, 0; = 2, 
(d) X has pdf f,(x) = e--^ if x > 0, and = 0 otherwise, and 6% = 0, б, = 1. 
(€) X-—NB(150) 6 = +, 06, = 3. 


3. Prove Theorem 2. 


14.6 SOME PROPERTIES OF THE SEQUENTIAL PROBABILITY RATIO 
TEST 


In this section we study some Properties of the SPRT. We begin by showing 
that sampling terminates eventually with probability 1. In the following we 
write fy, = fo, fo, — fi. 


Theorem 1 (Stein [126]). Let Xy, Xp, --- be iid rv's with common pdf(pmf) 
J; under H;i = 0, 1). Let Nbe the number of observations required for 
reaching a decision by using the SPRT. With respect to any hypothesis à 
Н (пої necessarily Hy or Hy) for which P(|Z| > 0|H} > 0, where Z = А 
log {fi(%)/fo(%1)}, the following results hold: wed 
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(0) Py{N < о} = 1, 
(2) Ее} < о. for—o «t«ty t9» 0. 


Proof. „Тһе tv's Z; = log { ЛОХ) )) are iid. Let r = r(m, k) be the 
largest integer < m[k, where m, К are fixed positive integers, and consider 
the partial sums Sp, So, — Sy, >, Sye — Su p, where S, Xu Sr dy 
2, +++. If N > m, then S;e(b a) for i = 1,2, —, m b = log B, a= log A, 
and, in particular, for i = k, 2k, ---, rk. Hence 
[Т] = |5. = S«-vi] < |b| + la] =e, say, G9 1, 2, 7. 
It follows that 
P{N> m} = Р{Т]<с, i= 1,2, +, ғ) 
(9 s [Р{|Ту] < o. 
since the T; are iid. Thus 
„lim P(N > m). < lim [РОТ < c} |... for fixed К. 

If P([T| < с} <1, this last limit equals 0. This is what we show next. 
Since P{|Z| > 0} > 0, we can find an ¢ > 0 such that either P{Z > e} 
> 0 or P{Z < —e} > 0 (or both). Then 

PIT) >с} = P(lzi + + Zl > с} 

= P(Z + ++ + Z, > с} + Р{— £z,> c} 


> P(z, 2$. inl? + P(z, < =, i212. 
Choose and бх k = k(e), an integer such that c/k < e. Then 
P{|T;| > с} = [Р{2\ > 9 + tP(Z < —є}' > 0 
by hypothesis. Thus 
РТ «dà =ò <i; 
where à is independent of m. It follows that 
Jim P{N > т) < lim a" = 0, 


where г = r(m, К) = r(m), since К is fixed. Thus 
P(N < c) = 1 = P(N = œ} = 1 — lim P(N > m) = 1. 


To prove (2), we note that we have the estimate 
(4) Р{М> т) <a” for every m. 
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Thus 
Ee" = F e" РІМ = m) 
m=1 
< Xe" P{N >m- 1) 
т=1 
= еў е" РМ > m) 
m=0 
< е! ze om 
oU yo im am/k sr—m/k 
=e ze on" g^ 
«e P tL yi" ey" < oo, 


provided that e 6!" < 1 or t < — (log d)/k = to. Since 0 < ô < 1, t9» 0 
and the proof is complete. 


Remark 1. In the proof of (1) we obtain the rate of convergence of 
P{N > m) to 0 as m — оо, and show that it is exponential. Clearly, the 
probability generating function Ez" of the гу N is finite for 0 < z < 1/3". 
Moments of all order exist for the stopping variable N. 


Corollary to Theorem 1. The SPRT terminates with probability 1 under 3 
both Hp and Hj. 


For the proof we need only to show that 
P(z|»9]H) >0, i=0,1. 
We need the following lemma. | 
Lemma 1. Let f and g be pdf's (pmf's), and let C = (fe g}. Then 
i fe) 
6) frentes £9. dx z o | 
if f and g are pdf's, and 
f(x) 

(6) PESL £3 = 0 


if f and g are pmf's, with equality if and only if f(x) = g(x) except for а 
null set. | 


We leave the reader to supply the proof of Lemma 1. From Lemma 1, it 
follows that, unless P{ fo(X) = Л(Х)) = 1, we have 


E{Z|Ho} < 0. 
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This imples that P(Z < 0|Ho} 0 or P(JZ| > 0|Ho) > 0. Similarly, by 
taking f = Л, g = fo, we obtain, from Lemma 1, P(|Z| > 0|H;) > 0. 

We next study the power function (operating characteristic function) of the 
SPRT. Let us write i 


n 


(7) 0) = P,(Reject Hj). 


An exact evaluation of (0) is, in general, very difficult. A simple арго. 
mation can be obtained, however, provided that the equation 


@ * &( 409 \ = &e 21 


has a nonzero solution, t = (0). 


Lemma 2 (Wald [134], 158). 1 Z is an rv such that 


(9) P{Z>0}>0 and P{Z<0}>0, 
(10) M(t) = Ee" exists for any real value t, 
and 

(1) EZ 4 0, 


then there exists a /* #0 such that M(t*) =1. Moreover, if EZ P 
then /* > 0; and if EZ > 0, then r* « 0. 


Proof. Since P{Z > 0} > 0, there exists an ғ > 0 such that P{Z >€} = 
à > 0. If t > 0, then 


(12) M(t) > f e“ f(z) dz > e" P{Z > ғ), 
at 


where f is the pdf of Z if Z is a continuous rv. Inequality (12). holds also 
for the case where Z is a discrete rv. It follows that in either case à 


M(t- o as t> о. 
Similarly, we use P(Z « 0} > 0 to conclude that 
M(t) ^ oo as t> —oo. 

Now condition (10) implies that moments of all order of Z exist and 
may be obtained by differentiating M(?) under the expectation sign as 
many times as needed. Thus p : 

М") = E(Z'e?) > 0. for all real t, 
using the same argument as above. It follows that M(t) is a convex func- 
tion of t; and since M(t) > oo as t> +00, M has a minimum at t = to, 
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where M’(i) = 0. Since М'(0)  E(Ze^), M'()|c-0 = EZ # 0 by (11): Thus 
to # 0, and since M(t) is minimum, 


M(t) < M(0) = 1. 


Thus M is strictly decreasing over (— oo, fo) .and strictly increasing over 
(tg, оо). Since M(0) = 1 and M(t) < 1, it follows that there exists exactly 
one real t* # 0 such that M(t*) = 1. 


M(t) 


M(0) = 1 


Fig. 1 ж <0. Fig.2 1* >0. 


Thus, if f > 0 (Fig. 2), so also * > 0; ard if tọ < 0 (Fig. 1), t* < 0. 
Furthermore, if t; > 0, then EZ = M'hi- <0; and if t, <0, then 
EZ = M'(t|,-o > 0. Thus t* and EZ have opposite signs, and the proof 
is complete. 


Let us use Lemma 2 to approximate (0). Let 1* be a nonzero solution 
of (8). Then 


i 09 [ACT 
ur fi) - [469 | жо 
is a pdf (pmf). Suppose that 1* (the other case can be treated similarly), 


and consider the SPRT with stopping bounds A^, B for testing f; against 
ff. The test is to continue as long as 


to reject f; if Y 
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andito reject fif lift! mi сойо почто {пеш 

T) T 

s pr. 19 

Dro У 
Let «*`апа 1 — 8% denote the probabilities of rejection (of fo), where % and 
f$, respectively, are the true densities. Then e 

ж: : 129 UNS 

(14) 2: “ү E EM 6 s os NS 
But à ) , : | И ^ 


1 gitit 


(15) 


Ay) D 
EJ = [5 А 
Де? Пле { 

so that the test is the same as the SPRT of Ho against’ Hj with stopping 


bounds 4 and B. It follows that the probability of ih for ths two tests 
when fy is true must be the same. Thus, <5] 


ў gi cs 1 & Bt BY 
(16) a* = (0) = Е 127770 
Equation (16) is obtained by solving (14) for a*. 
Note that /(00) = 1 is a solution of celi 

Ey ef 02 S^ n 

and (ду) = —1 is a solution of 
Ее? = vic A 

- Also, Ey е? =1 when t = 0. However, hey is not defined for t-= 0, In this 
case à 


log B 
ay { KOl => БРА To В! 


by L’Hospital’s rule. Also, m 
Вб) =a and AO) =P; {Reject Ho} = 1 — & 


о) в wi 


as 1* > co, Similarly, (8) > 1 аз t* >=, ») 05) 


Finally, 
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We summarize these values of the power function in the following table: 


t — -1 0 1 © 
LEN 6 o 
B0): $ 1-8 —log Вор (А/В) a 0 


Example 1,, Let, Xi, Xo be iid) b(1,.9),rv's,. To test Но:0 = bo against 
Hy: @ = 0, > бу we have 


ху — p.yl-X | 
А ‘lees e ШЕ E ax] } | 
= i) +a- o(a): 
Setting this equal to 1 and solving for 0, we get 


i 1 - [= 0D = 69r 
18 [E 
E. (бб — [01 — 0/0 = 691 


Letting : + 0, we get 
(19) vs n log ((1 = 6;)/(1 — 69)] ; | 
log Gilde) = log (1 = 00/07 = 69] 
In particular, if бу = .5, 0; = .9, а = B = .05, then | 
2 S d | 
9 —1 | 
Also, B = 4, A = 19, so that 
d EE E) 
Ao 19° — 197 
We have 
dare oo -1 0 1 © 
morari ee TRO SOE gl 1733 .50 0 
&(0): 1.0 .95 50 .05 0 


Example2. Let Xj, Xo. be iid (ру 1) rV's. It is required to test 
Ну: р = uo against Hy: и = ду > po. Then 


ji 
ЖЗ Z2 + (un — wo) X 
and Ӯ 


Eet = exp( 5 eel Ho) tà + (ду = mf) 


0). exp yn -pp tepe dese: 
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Solving 
E, Pd = 1 
» S у 
with respect to t(y), we get ? 
(21) ү, quys t tn 
4 — Ho 


The approximate power function is therefore 
Hu 1 18/01 = or^ 
Ao S tt — gyaf'? — [Bd — a^ 


whenever 1(u) 4 0, where (д) is given by (21). It is easy to see that 
(и) = 0 for $ à 


t e B, 
so that from (17) 
Sey unn ASO УЛДА, log (1 — a8] ^ 
BUD 9 - ов (af 7 DUET E 
We have Л 
t -o el 0 jd . eo А? 
и: д, ut pdf 
Ки): 1 1-8 © Ka*) a QA 


Next we find an approximation for EN. If we approximate Sy' by log 
A if Но is rejected and by log В if Н, is rejected, then 


me 
Bl 4 


Q2) ES, s 0) log А €t — BO] log Bi" ^7 
and if E,Z # 0, then 1 . i 
_ EySy 0) log. A + [1 — KO) log В 
(23) | E,N = BZ & BZ, HORE 
Thus 918 X mE 
Q4) ENS A om B Кб) = a; 
апа 
@ "^ ще =® Л EE E ym TTE 1 
0, h, 3 


If Е,2:= 0, we will show in the next section that 


(26) EN = – iM 
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Example 3. In Example 1, 


Е,2 = 0 log 0 Ld — 0 ов = ^ 


=0 NET = 2] + log (= = 2) 
If 0) = .5 and 0, = .9, then $. í 
EZ = 0 log 9 — log 5 Б 


and 
2 2 
bib ae 3i ots is ы 1-6 
| Е,22 o(iog $) +d O(log To 2) ; 
We have 
i -0 -1 0 1 [re] 
0: 1.0 .90 .133 .50 0 
КО о иле, ш 95 50 05 0 
EZ: | 108 1.8.9 log9 =log5 0 log .6 —log 5 
EN: 5.01 12 9.17 5.19 1.83 


EN when E,Z = pi is obtained by using (26). 
song 4. Let X ~ (0, 1), and 06, = 0, 0,21, а = В = .05. Then 
= .95/.05 = 19, В = .05/.95 = i. also, 


і 


зой, мї дБ m юв = X-4 


so that EZ = 0 — 4, Ey, Z=-4, By Z = 1. Also, 


EZ = EAX- 4 = 0 — 0 + 3, 
es T oe ^ 


11= 2.265 i i ES 
EN 5.3 Š, &) 
Ep, Su © .95 log 19 — .05 log 19 
= 90 log 19, 
so that КА, шш” ea 
EN 53 


Let us consider the fixed: sample. size.procedure which ксы. m Ht 
X, > k. Then 


к-кө erras ibus 
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and 


5 " : ¥,-1 k-l 
=P, О = К) PE cj 
b= Р, sk) = Р hs $50} = 05. 


It follows that 4 „122 j 4 \ А 
1.65 = и К (аһа т 51.65 =4/ n (k — 1), 
giving п = 10.9 


We have ` 
to ceil ahi ИК, Sol oo 
) Qu YORI 56% 0020 ü 
X0): 1 95 5 05 б 
EZ: 0.50 01...50, | 
БМ) Ай Эмчек 05:3 $6779 ТУЙ HTS 


In Example 4 we see that there is some saving (on Ит average) if we use 
the sequential test over the fixed sample size test Let us examine this: possibility j 
a little more-closely: ai А 


Example 5. Let X, X» =- be iid .{ (0, 1) rv's, and Suppose that, we 
wish to test Hy: 0 = бу against Ay: 0 = 010. > Oo). Let us first consider 
a fixed sample size procedure, The most’ powerful test rejects, Hy at Jevel 
aif X» k. If the probability of type II error. is fixed at a preassigned 
level 3,sayy we:need to choose те number of observations, so that, 


OP AEX Sky mie апа |Р) (Xx К} = 3. 


It follows, on standardization, that 


апай 5} ҮА ГЕ £s 
3} 
5n p bn Jut 
: OT Бле (22+ 2a)» f »oidg t f { 
(2 У ЧУ: РЭ ETO 3 
" (0 = 00)" bos s Yo neal 
Let us now consider the corresponding SPRT. We have already seen that 
Za ы 01+ 00 20 
(28) Ж у= у у 


isa nonzero root of E. e^ = 1, where Z = log Uo QOL C Л]. Moreover, 
1(0) = 0 if 0 = 0*, where М 5.5 % 


І 
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(29) ra the, 
Also, ix d 0 
EZ = Е 00, — 06)X + 65 — 6) 
(30) x Ls deb (0, — Di 1(0) б 

and | 
‚= LES — В 0, — 00) (05 — 0X + 40: — 00)" X^] 

(31) = ML = 6%)? + 40, — 00) (02— 090 + AO — 05) (1 + 05]. 

Thus 


EyZ? = 110 — B? 200 = 03)" + (0, — 00)" (4 + (0o + 01) 
з =~ 6) 
Using approxi nations (24) and (25), we have 


ru _ 2a log KL — 8)/а] + (1 — a) 1ов[8/(1-—а)] 
E» FANT ad (6, — 65)? 
PRU bu 

o». А e yw? = Dogi 8010] + 81ов[8}(1 — аў]. 
b oy u On Oo 

Let u us! write rg = кым, тү = Е, Njn, where n is given by (27) Then 
ы —2 Alog 8а] + (1 — a) log [BIC — оў] 

(2. t zg) 

and 

(36) ^us A logKt — Aled] + 8 108 [8/1 — ай]. 

д 3 (Za + Zp)" 


Thus ro and r; depend only on a and ĝ, and not on бо and 0;, as long as 
б, > бу. In Tables 1 and 2 we have computed 100г and 100r, for selected 
values of o and 6. 


"Table 1.- 100r, 
a 
8 01 AL 03 05 10 
01 41.5 48.8 $2.5 56.4 
03 38.4 46.2 50,5 55.7 
05 36.6 44.7 49.2 55.2 


10 33.1 44 46.4 53.7 
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(^ 7^ Table2. 100r, aio denied coh 
gm a = {, T, 

в 01 cae 05 pii 

А... - IYA 

01 41.5 384 36.6 33.1 

03 48.8 46.2 44.7 41.4 

05 52.5 50.5 49.2 46.4 

ло 56.4 553. 55.2 53.7 


It is clear that the average sample size required under Hy and H, when the 
SPRT is used is quite small in. comparison to the one needed for а fixed 
sample size procedure. In other words, the savings in number. of observa- 
tions using SPRT is considerable. Indeed, the SPRT has the following 
optimar property, which we State without proof. 

{ BD SE s i aim red; iso 
Theorem 2. (Wald and Wolfowitz [135]). “Among. all tests (sequential „ог 
not) of a simple hypothesis Hoagainst a simple alternative H, for which 


P{Reject Ho| Haj .S às: P {Accept пл АМ 


and Ep, N and Еу N are finite, the SPRT with strength (a, д) minimizes both 
Ey, N and Ey, N. ў 


2*5] E | Eng. . єй 
We conclude this section with some remarks and an example, 


Remark 2. Several questions have not been answered in Section 14.5 and 
in this section. For example, given (a, B), do there existconstants A and B? 
An affirmative answer was provided by, Wijsman [138, 139]. Again, granted 
the existence of stopping bounds A and B, is the SPRT unique in any sense? 
Several authors investigated this question, The answer is yes апа we refer 
to Anderson and Friedman [2] for a restricted but readable proof. Wijsman 
proved uniqueness without any restrictions, , i Aa 


Remark 3. In many special situations there is no “overshooting” of the 
stopping bounds, апа the approximations of this section become exact. The 
following example illustrates the point... yiii S.A) algami 


to yilitiedox A 22x27 f bh ath 

` Example 6. If the rv's X; take three values, — 1, 0, and 1, 4 and Bare 
taken to be integers, there is no overshooting of bounds A and В. То this 
сазе the approximations obtained above are all exact. 9.) AMAJ 


Consider, for example, Xi Xo ^7 iid rv's with common pmf 
5 x-1 i 
fom A y xz2l1" бера 
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n is required to test Ho: p = po against Hy: p = 1 — po, 0 < py < 1. Then 
fo) £x pius! Z2 
Г , 
апа 
AX) 
09) 
1 — ро 
= X, log $= P 
i Po 
= ZX 


where 2 =log (1— po)/Po >0. Thus Z; takes values + z and — z, respectively, 
with probabilities p and 1 — p. Let us write 


Z, = log 


log A= а= 12, log B =— b = – mz, 


where / and nrare positive integers. In this case we get exact results since 
‘each step is of size z or ~= 2. We have 


Муй) = Epe” = ре + (1 — pje“, 
so that the nonzero root of M,(t) = lis given by 


"Llog 172, 


provided that p # 4. Now {Sy >а} is the event "reject Hy”, and 
{Sy < — b}, the event “reject Ну”. Thus from (16) and (17), we have 
b» 1- [0— p/p" if phd 
P,{Reject Ho} - Ki РУР] = = рур 7 : 
V ing X Tim if p=} 
» Similarly from (23) and (26) we get ' 
went UTC KS I = рур") mii pypr—1)- 
E,N = | 2-1 рур 1а БУ” Poked 
8, Im if p=}. 
' The reader will notice that this is the same problem as was discussed in 


Example 14.2.6. The probability P, {Reject Ho} is essentially the probability 
of B's ruin, and 1 — P, {Reject Ho} is the probability of A's ruin. 


bn: 


SHIRTS i 


PROBLEMS 14.6 


1. Prove Lemma 1. 
[Hint : If f is convex and E|X| < co, then ERX) > f(E(X))]. 


йыл; 1. 
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2. Let Z,, Z» --- be independent гуз and ap, b, be given constants, a, > by, for 
all п such that lim,.. а, < со and іт, „2 b, > — co. Also, let N be the smallest 
n such that $,— Xi, Z, > a, or S, < bn. If 6 > 0 and e > 0 can be found such 
that at least one of P(Z, > б} > є ог P(Z, <:— б} >e holds for all k, then for 
anyrz 0 s 


lim z P(N >п) = 0, 


that is, moments of all orders of N exist. 
3. For Problems 14.5.1(b) and 14.5.1(c) find the (approximate) power function 
and the average sample number (expected sample size.) 


14.7. THE FUNDAMENTAL IDENTITY OF SEQUENTIAL ANALYSIS AND 
ITS APPLICATIONS 


In this section we prove the fundamental identity of sequential analysis and 
study some of its applications. 


Theorem 1, (Bahadur [5]). Let Y, X va be iid rv's with. common pdf 
fcm under hypothesis H. Consider any sequential procedure with a 
given stopping rule, and let N be the number of observations required to 
reach a decision. Let Z; = Z(X;) be some (Borel-measurable) function of 
X; and write Sy = D/L, Zi Assume that M(t) = Epe"! exists for t in some 
neighborhood of the origin. Then P{N < oo |H} = Lensures that for all t 


(D Е{#®м м *} = PN < oo |H), 
where, under hypothesis H, the pdf (pmf) of X is 
@) КЕЛ 


Proof. Let R, € Ri be the region leading to the termination of the se- 
quential procedure at the nth stage. Then 


Eyf NMOS) 
=й f, tI" nr n ах 
n=1 " 
wh Д fred ex 
' ZP(N < o Hj. А 
Here x = (ху, Xo "55 Xn) and we have assumed that the rv's are continuous. 


The proof for the discrete, case is similar. 


Corollary (The Fundamental Identity of Sequential Analysis: Wald [134], 
159), Consider the SPRT and let 


` 634 "SEQUENTIAL STATISTICAL INFERENCE 
Хүн) 
Z= Z(X) = log ЛАН). 
| өө ЖЕ 
If H-is any hypothesis such that P{}Z| »-0| Н) >.0, then 
(3) E, {eM *) = 1. 


Proof. We leave the reader to supply the proof, 


The fundamental identity has many applications, Assume. that both.a and 
b (a = log A, — b = log B) are finite and are the stopping bounds, Then, by 
Theorem 14.6.1, 


(4) PIN < o[H)-l, i=0, 1. 

By Lemma 14.6.2, whenever Ej;Z # 0, there exists a t* # 0 such that 
M(t*) — T and (3) becomes х 

(5) Ey {ew} = 1. 


Let Ey E(e"*N|Sy < — Б}, and E, = E(e/?w |$, >а}, Tt follows easily 
[using (4)] that E 
1 
Ep 


If we neglect the excess over the boundaries and write Sy =: a when Sy > a, 
and Sy = —b when Sy < - b, then for EZ # 0 we have 


(6) P(S, <b} = pe nies and! P(Sy >a} = pj e Ee ze. 


let 1-е" 
E pue V Mira Ee 


(7) Ра 2 uA ere ` Dien 


If EZ = 0, then ¢* = 0; and, letting 2* — 0, we obtain 


Ы PEU. 
(8) \ ti Б Улут р» Paw cade a 
which are independent of the particular distribution involved. 

Let us next use (3) to obtain the approximate distribution of the stopping 
tv М. Consider the substitution z^! =\M(t). Since M(t) is convex, this 
equation has two rea! roots in t, provided that z^! > M(t5), to being the value 
at which M achieves a minimum: Let the roots be A,(z) and А2), where 


A,(1) = 0 and (1) = t*. Setting t=2,(z) in (3), we get 
(9) EfeSn 2м} = l i-1,2 


Now, using the approximation resulting from neglecting the excess over the 
bounds, we have 


(10) p, е E(z") + p_ye ^9 E (z Py = 1, "in 312. 
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Equations oe can be solved for E, and Ej; and then, using (Т) and (8), we 
can obtain i pre pet of N from the following шаш 

(11) EG“) = p,E z^) + pAE 2. 

Here E, and E. ; ate conditional оре conditional on absorption at 
a and —b, respectively. 

To obtain some fürther interesting resülts we observe that, if M(t) exists 
for all real t, the fundamental identity can be differentiated any number of 
times under the expectation sign: ' Differentiating once and se bie, at 
t = 0j we obtain’ Wald’s equation (Theorem 14: 3.1): 


(12) . ESy = EN- EZ, EZ #0, 
and if EZ = 0, a second derivative yields (Theorem 14.3.2) 
(13) EN = ES?/EZ’. 

Ignoring once again the excess over the boundaries, we have 
(14) ESy © ap, — bp-s 
and "A 
(15) «т t {зї ES, m a p, + р-у 


It follows from (12), (13), (7), апі (8) that 


Apatbrac -be if EZ 40, 
EZ) (0—6 
(16) ENS as 


Ez? 


if EZ = 0. 


M 
) 


This proves (14.6.26). 
Example 1 (Kemperman [56], 70). (For d <1, consider the pdf А 


fave" if x>0) (A> 0,» > 0), 
(17) fe) = {a aye if x <0. 
The п of fis given by 3 
y px ia 


» 


PLI 
A<t<vIn ee ifa = АА + >»), then 


ap УЕЙ Мота 


which exists for’ — 


(19) 3 327 SiN i ма) = 9: 
In random walk terminology, each step of the particle i is the sum of two 
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independent.components, one having an exponential distribution an, (0, оо), 1 
and the other, an exponential. distribution on (— о, 0). Alternatively, we 
may regard each step as the difference between two independent rv's each 
having an exponential distribution. 

If the absorption occurs at the upper barrier a, the step that carries the · 
particle over the boundary must be from the positive component of the 
mixture. The excess, Sy a, ás the. excess of the exponentially distributed 
ту Zy over the quantity ait Sy-3, provided. that Sy >a: The excess there- 
fore is, due to the last step. Now the exponential distribution has no mem- 
ory (Theorem 5.3.8). It follows that the distribution of the excess conditional 
on Sy > a is also the same. Thus 


(20) Е{ё®м-®|5, > a} "uie NIS 
So that 
ta 
Ee | Sy >а}= "e m 36 


independently of N. Similar remarks apply to the lower boundary. 
Tn this case therefore the excess over the boundaries is exactly accounted 
for, and we get exact results. The fundamental identity takes the form 
ta i зв (X3. 
QI) p (2 etu Or" Is, > а} 


y»—t 


(14 


tb 
+ PAZ) EMO] Swys — 9) -1. 
The equation : 


MU us EVE 


vÀ el 
(-571 
yields t* =y — д, It follows from (21) that 


Q2» с рге A pe HEB ы, 
М f PRI ? 
Since p, + P-y7 1, we get 9 
1 =v) ee» 


1 = ( a 
M су, OMe шуны ius 


1 
A ТАСУ y Sif -Ale 


} 


a+b+ Ax 
The second result in (23) is obtained by letting À y in the first expression. 
The approximate result in this case is j 


" ("Dicis 877 copa 
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ы кер; d 
| E Аш вав NAM 
(24) p, ©: 


У _ aie 
ep 


a+b EEEN 


Let us specialize (17)\a little further by letting у= 1 — 0 and A = 1 4 6, 
10| « 1. Then a = (13-0)/2 and 1 — a = (1—6)/2, so that 


1-0 al _., Mol 
T pred эе if x>0, 
‹ XA ъс, qi if x«0. 
To test Ho: 0 = — 4 against Hió- +} we use the SPRT. We iM 
los Jia) 
ивр 1.0: й 
It is convenient. to, write | i 
t КУЛТ. NN Р WM 
Go UD n do)» 156 е1. 
Then Күт, ы wid 
2; = log (exp X, — |X] +4 Х, + |%|)} 
ea a AW VETT 
A M care таала га И nO ED 
-T-0-2zt«1-0, 
and woe wl edie 
(28) it2y-A-(-0-(1240)2-20. 
From (23) abd bs 
Po= Psy Say ала 2 ооа роуа 
| а р-а + oa — oe” god 040 
e = od 0e 7^ = (a6) — 0)e н 
> 1+b ў Ера 
Inm Р А ss у‹ А: E 
AEP + 02090 е gz, 
5 _ |} d= bye — (1 +6) e" Ба 
(29) gx Man E if 0-0 


$ ж 
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To find EN, we have, using a similar argument, 


and 


Now 


Ej{Sy — a|Sy > а) = 1 


Ебу + b|Sy = Ta- T 


Sq Se 


y 


20 
RZE Е.Х i 
D @ 1- 


which is # 0 for 0 # 0. For 0 50; 


(30) 


Ew- gf (pde =)= re (62) 


E 


If 8 = 0, we compute EjZ?. We have 


so that Ey-Z?=2, Thus ` 


Now 


so that 


апа 


Ez EX? Dn) a 
2 ў 
чт peor ty 


Еө-о5у = PoEo{Sy|Sy 27a) + p-sEo{S2|Sy < — b}. 


үч 
A 


Es((Sy — ay|Sy > a) = Mr 


ey" 
Е(5м By|Sy s b) = 22) 
(Sv viu LEX (cce | 
1 ur | 
$ 215 на Ae І -d 
BS ad a 2 oy MU ) i 


Е,(52 |5, < =b} = йй + n + b) -Ё. 


-r( ro +>} 


т 
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Finally, 
Е52|5 > a) 2.2 + 2a + d^ 
and 
Eo{Si|Sv < — b} = 2+ 2b. P. 


It follows that 


EN = az [p2 + 2a + d?) + p_(2 + 2b + b’) 


5 (6 * D 22 d) 


EE EXIS 
4 (a 4 1) 2. + 2b 4- b) 
(31) = + (a + 1) (b+ 1). 


Example 2. Let us return to Example 14.6.6 and compute, the pgf of the . 
stopping variable N. The equation x ' = M(t) reduces to 


pie” — e + х(1—р)=0. - 
The two solutions are 
i“ A(x) = flog {+ ES ape =P} 
and 
5a) s zog! vlc 4px = p). 
| Ах) = 2 og { EF А } | 
Substituting ¢ = А(х)їп Wald’s fundamental identity, we get. ~ 
E(gé95N xX} md, = 1,2. 


Thus Ў 
E(x"|Sy = а)еР(5у = а) + E(x"|Sy = – е7 = 1, 
that is, 
P {Reject нр} E {x"|Sy = 
+ (1 — P {Reject noe E(x"|Sy = = mz) = 1 
and 


P {Reject Hiper Е{ |5 = Iz} 
+ (1 — P {Reject "phe noe E(x"|Sy = — mz} = 1. 


Solving for E{x*|Sy = Iz) and E{x"|Sy = — mz}, we get 
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‘Table 1. Cumulative Binomial Probabilities, рь )ea - pyr = 0, 
1, 2, - el 


р 

r 01 05 10 20 .25 30 333 40 50 
2 0 9801  .9025  .8100 .6400 .5625  .4900  .4444 .3600 .2500 
I 9999 .9975 .9900 .9600 .9375 .9100  .8888 .8400 .5000 
3 0 .9703 .8574 .7290 .5120 .4219 .3430 .2963 .2160 .1250 
„1 * .9997 9928 9720 .8960 8438 .7840 .7407 .6480 .5000 
2 1.0000 .9999 .9990 .9920 .9844 .9730 .9629 „9360 .8750 
4 0° 9606  .8145 .6561  .4096 .3164 .2401 .1975 .1296 .0625 
1 9994 „9860 :9477 .8192 .7383 .6517 .5926: .4742 .3125 
2 1.0000 .9995 .9963 .9728 .9492 .9163 .8889 .8198 .6875 
3 1.0000 .9999 .9984 „9961 .9919 .9877 .9734 .9375 
5 0 9510 .7738 .5905 „3277 .2373 1681 .1317 .0778 0312 
1 9990  .9774 .9185 .7373. .6328  .5283 . .4609  .3370 .1874 
2 1.0000 .9988 .9914 .9421:. .8965 .8370 .7901 .6826 .4999 
Е) 9999  .9995 .9933 „9844 .9693 .9547 .9130 .8124 
4 1.0000 1.0000 .9997 .9990 „9977 .9959 .9898 .9686 
6 u 9415 :7351. .5314 .2621  .1780. .1176 -.0878 .0467 .0156 
1 9986 .9672 .8857 .6553 .5340 .4201 .3512 .2333 1094 
2 1.0000 .9977 ..9841  .9011 .8306 .7442  .6804  .5443 .3438 
3 9998 .9987 .9830 .9624  .9294 .8999 .8208 .6563 
ү 4 9999 .9999 .9984  .9954 .9889  .9822 .9590 .8907 
M 5. 1.0000 1.0000 .9999 .9998 .9991 „99837 .9959 .9845 
LA 0 9321 .6983 .4783 .2097 1335 .0824 .0585 0280 ..0078 
E .9980 .9556 .6554  .5767. .4450 .3294 2633 1586 .0625 
2 1.0000 ‚9962, .8503 .8520 .7565 .6471 .5706 4199 .2266 
3 9998  .9743 .9667 .9295 -8740 .8267 .7102 .5000 
4 1:0000 .9973 .9953 9872 .9712 9547 9037 .7734 
s 9998  .9996 ..9987 .9962  .9931  .9812 .9375 
6 1.0000 1.0000 1.9999 .9998 „9995 .9984 .9922 
8 0 9227  .6634  .4305 .1678 1001  .0576  .0390 0168 .0039 
$ 1 29973 9427 .8131 .5033 .3671 .2553 1951 1064 .0352 
21558 9942 ..9619  .7969  .6786  .5518  .4682 3154 .1445 
3 1.0000 .9996 .9950 .9437 .8862 .8059 .7413 5941 .3633 
4 1.0000. .9996 .9896. .9727. 9420 .0120.. .8263 6367 
5 1.0000 1.9988 .9958 .9887 .9803 .9502 8555 
6 1.0000 .9996 .9987  .9974 .9915 .9648 
7 1.0000 .9999 .9998 .9993 .9961 
9 0 -9135 .6302 .3874 „1342 .0751 .0404 .0260 .0101 .0020 
1 :9965 .9287 .7748 .4362 .3004 +1960 .1431 -0706 .0196 
2 .9999. .9916 .9470 .7382 76007 ..4628 .3772 .2318 .0899 
3 1.0000 .9993 .9916 .9144 .8343 .7296 .6503 4826 .2540 
4 Е :9999 9990 .9805 .9511 .9011 .8551 .7334 .5001 
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O © 090-10 U Ф оо һә сз о оо Уо чл Ф хәр = © o0-90u 


^w 0 -1O о ьш I9 — о Ж O60 мї оул & оюк 


‚ 1,0000 
.9044 1.5987 
.9958 .., .9138 
1.0000 .. 9884 

‚+9989 

9999 

210000 

it 
.8954 .5688 
.9948 .8981 
.9998 ...9848 |. - 
1,0000 : + 49984 
ч ‚9999 

1.0000 
18864 .5404 
19938 .8816 
.9998  .9804 
1.4900  .9978 
1.0000 .9998 
1.0000 1.0000 
.8775 .5134 
.9928 .8746 
.9997 .9755 
1.0000 .9969 

.9997 

1.0000 
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9998 
9999 
1.0000 


.9970 
-9998 
1.0000 


-9998' 


.9575 .9006 . 
9916 .9749 . 
.9989 .9961 

.9998. .9996 . 
40173: .0060 ~; 
011040  .0463 . 
2991 3672 . 
.5592 23812 . 
-7868  .6320 . 
.9234 8327. 
19803 .9442 . 
.9966 :9867 . 
.9996 9973. 
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.0016  .0036 . 
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12341 Л189 
044726 .2963 . 
^u 5398. 
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.9614 .9007 . 
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29999 9993 . 
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.007 .0022 . 
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811. .0835 . 
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.6315 .4982 . 
.8223— .6652.. « 
.9336 '.8418 '. 
98127 ©9427. 
*.9962  .9848 . 
7,9995 9972 . 
.9999 .9997 . 
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.0052 .0013 . 
.0386 .0126 
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.8965 7712. 
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Table 2. Tail Probability Under Standard Normal Distribution 
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This table gives the probability that the standard normal variable Z will ex- 
ceed a given positive value z, that is, P{Z > Z«} = a. The probabilities for 
negative values of 2 are obtained by symmetry. 
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‘Source. Adapted with permission from P.G. Hoel, 
4th Ed., John Wiley, New York, 1971, page 391. 
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Table 4. Student's t-Distribution 
The first column lists the number of degrees of freedom (n), The headings 


of the other columns give probabilities (а) for t to exceed the entry value. 
use symmetry for negative / values. 


t (n) 
a tha 
a 
10 05 0.025 01 .005 
n 
1 3.078 6.314 12.706 31.821 63.657 
2 1.886 2.920 4.303 6.965 9.925 
3 1.638 2.353 3.182 4.541 5.841 
4 1.533 2432 2.776 3.747 4.604 
5 1.476 2.015 2.571 3.365 4.032 
6 1.440 1,943 2.447 3.143 3.707 
7 1.415 1.895 2.365 2.998 3.499 - 
8 1.397 1.860 2.306 2.896 3.355 
9 1.383 1.833 2.262 2.821 3.250 
10 1.372 1.812 2.228 2.764 3.169 
11 1.363 1.796 2.201 2.718 3.106 
12 1.356 1.782 2.179 2.681 3.055. 
13 1.350 ‚ 1771 2.160 2.650 3.012 
14 1.345 1.761 2.145 2.624 2.977 
15 1.341 1.753 2.131 2.602 2.947 
16 1.337 1.746 2.120 = 2.583 2.921 
eoo 1.333 1.740 ^ 2410 2.567 2.898 
18 1.330 1.734 2.101 2.552 2.878 
19 1.328 1.729 2.093 2.539 2.861 
20 1.325 1.725 22.086 2.528 2.845 
21 1.323 1.721 20800 ^ ^ 2518 2.831 
22 1,321 1.717 2.074 2.508 à 2.819 
23 1.319 1,714 2.069 - 2.500 2.807 
24 1.318 1711 _ 21064 2.492 2.197 
25. 1.316 1.708 | 2.060 2.485 2.787 
26 1.315 1.706 > 2,056 2.479 2.779. 
27 1.314. 1.703 2.052 2473- ^ 527771 
28 1.313 1.701 ! 12.048 2.467 2.763 
29 1311 1.699. 2.045 2.462... .2.756 
30 1.310 1.697 2.042 7 52/557. (9.98. 12759 de 
40 1.303 1.684 ^ 2.021 2.423 2.704 + 
60 1.296 1.671 2.000 2.390 - 2.660 — 
120 1.289 1.658 1.980: 2358 1: 2.617 - 
оо 1.282 1.645 - 1.960 2.326 2.576 — 
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Table 5 (Continued) 


Degrees of freedom for numerator (m) 
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Table 6. Random Normal Numbers, и = 0andg — 1 ^ - 9 


01 02 03 04 05 06 07 08 09 10 


0.464 0137 2.455. —0.323 —0.068 0.290 —0.288 1,298 0.241 —0.957 
0.060 —2,526 —0.531 —0.194 0.543 —1.558 0.1837 —1.190 0.022 0.525 
1.486 —0.354 —0.634 ` 0.697 0.926 1.375 0.785 —0.963 —0.853 —1.865 
1.022 —0.472 1.279 3.521 . 0.571 —1.851 ..0.194 1.192 —0.501 —0.273 
1.394 —0.555 0.046 0.321 2.945 1.974 —0.258 0.412 ~ 0.439 —0.035 


0.906 —0.513 —0.525 0.595 0.881 d 11.879 0.161 —1.885 0.371 
1.179 —1.055 0.007 0.769 0.971 0.712 1.090 —0.631 —0.255 —0.702 
—1.501 —0.488 —0.162 —0.136 1.033 0.203 0.448 0.748 —0.423 —0.432 
—0.690 0.756 —1.618 —0.345 —0.511 —2.051 —0.457 —0.218 0.857 —0.465 
1.372 0.225 0.378 0.761 0.181 —0.736 0.960 —1.530 —0.260 0.120 


—0.482 1.678 —0.057 —1.229 —0.486 0.856 —0.491 —1.983 —2.830 —0.238 
—1.376 —0.150 1.356 —0.561 —0.256 —0.212 0.219 0.779 0.953 —0.869 
—1.010 0.598 —0.918 1.598 0.065 0.415 —0.169 0.313 —0,973 —1.016 
—0.005 —0.899 0.012 —0.725 1.147 —0.121 1.096 0.481 —1.691 0.417 

1.393 1.163 —0.911 1.231 —0.199 —0.246 1.239 —2.574 —0.558 0.056 


—1.787 —0.261 1.237 1.046 —0.508 —1.630 —0.146 —0.392 —0.627 0.561 
—0.105 —0.357 —1.384 0.360 —0.992 —0.116 —1.698 —2.832 —1.108 —2.357 
—1.339 1827 —0.959 0:424 0.969 —1.141 -1.04t 0.362 —1.726 1.956 
1.041 0.535 0.731 1.377 0.983 —1.330 1.620 —1.040 0.524 —0.281 
0.279 —2.056 0.717 —0.873 —1.096 —1.396 1.047 0.089 —0.573 0.932 


— 1.805 —2.008 —1.633 0.542 0.250 —0.166 0.032 0.079: 0.471 —1.029 
= 1.186 1.180 1.114 0.882 1.265 —0.202 0.151 —0.376 —0.310 0.479 

0.658 — 1.141 1.151 —1.210 —0.927 0.425 0.290 —0.902 0.610 2.709 
—0.439 0.358 — 1.939 0.891 —0.227 0.602 0.873 —0.437 —0:220 —0.057 
= 1.399 —0.230 0.385 —0.649 —0.577 0.237 —0.289 0.513 0.738 —0.300 


0.199. 0.208 — 1.083 —0,219 —0:291 1,221 1.119 0.004 —2.015 —0.594 
0.159 — 0.272 -+—0.313 0.084 —2.828 —0,430 —0.792 —1.275 —0.623 —1,047 
2.273 0.606 0.606 —0.747 0.247 1.291 0.063 —1.793 —0.699 —1.347 
0.041 —0.307 0121 0.790 —0.584 0.541 0.484 —0.986 0.481 0.996 
—1.132 —2.098 0.921 0.145 0.446 —1.661 1.045 —1.363 —0.586 —1.023 


0.768 0.079 —1.473 0.034 —2.127 0.665 0.084 —0,880 —0.579 0.551 
0.375 —1.658 —0.851 0.234 —0.656 0.340 —0.086 —0.158 —0.120 0.418 
—0.513 —0.344 0.210 —0.736 1.041 0.008 0.427 —0.831 0.191 0.074 
0.292 —0.521 1.266 —1.206 —0.899 0.110 —0.528 —0.813 0.071 0.524 
1.006 2990 —0.574 —0.491 —1.114 1.297 —1.433 —1.345 -3.001 0,479 


—1.334 1.278 —0.568 —0.109 —0.515 —0.566 2923 0.500 0.359 0.326 
—0.287 —0.144 —0.254 0,574 —0.45] —1.181 —1.190 —0.318 –0.094 1.114 
0.161 —0.886 —0.921 —0.509 1.410 —0.518 0.192 —0.432 1.501 1.068 
—1.3446 0.193 —1.202 0.394 —1.045 0.843 0.942 1.045 0.031 0.772 
1.250 —0.199 —0.288 1.810 1.378 0.584 1.216 0.733 0.402 0.226 


0.630  —0.537 0.782 0.060 0.499 —0.431 1.705 1.164 0.884 —0.298 
0.375 1.941 0.247 —0491 0.665 —0.135 —0.145 —0.498 0.457 1.064 


Table 6 (Continued) 
—1420 0.489 —1.711: —1.186 0.754 —0.732. —0.066 


—0.151 —0.243 —0.430 —0.762 0.298 1.049 1.810 
—0.300 0.531. 0.416. —1.541 1.456 - 2.040 —0.124 


0.424 —0.444 0.593 - 
0:593 0.658 —1.127 —1.407 —1.579 -1.616 1.458 


n 


1.006 - 
2.885 
0.196 


0.993 —0.106 0.116 0.484 —1.272 


1.262 


0.862 —0,885 20.142 —0.504 ^ 0.532 1.381 0022 —0.281 


0.235 —0.628 —0.023. —0.463 —0.899 —0.394 —0.538 
—0.853 0.402 077 0.833 0.410 —0.349 —1.094 


Source. From tables of the RAND Corporation, by permission. 


1.707 
0.580 


—0.798 ` 0.162 
—0.768 —0.129 
0.023 —1.204 
1.066 1.097 
0.736 —0.916 
0:342 1.222 
—0.188 —1.153 
1.395 1.298 


Table 7. Critical Values of the презела ды, Опе = Sample Test 
Statistics 
This table gives the values of Di, and ‘D,,, for which a > Pip >р Ly 
anda > P{D, > D,,,} for some selected va values of л and a. 


————————————————————————————_———_—_——__—_— 


One-Sided Test: 
a= 10  .05 
Two-Sided Test: 
a= .20 .10 
n=1 .900 .950 
2 .684 .776 
3 .565 .636 
4 .493 .565 
5. .441 (.509 
6 .410 .468 
7 .381 .436 
8 .358 .410 
9 ,339 .387 
10 .323 .369 
M .308 .352 
12 .296  .338 
13 .285 .325 
14 .275 .314 
15 .266 .304 
16 .258 ..295 
17 .250 .286 
18  .244 .279 
195. 5231 АП 
20 .232 .265 
Source. 


.025 


.01 .005 a= 
02 01 а=. 
.900 „995 ` n= 21 
.900 .929 22 
.785 .829 23 
.689 .734 24 
.627 669 25: 
1577.617 26 
.538. 576 27 
.507 .542 28 
480.513 29 
.457  .489 30 
.437  .468 31 
.419 449 ` 32 
-404 .432 33 
.390 ..418 м4 
т. 35 

.366 392 
,355 .381 31. 
.346 - .371 38 
.337 .361 39 
.329 .352 40 
Approximation 
for n > 40 


10 


1165 
107 


Уят 


.189 
1.22 


Ул 


+213 
.210 
1.36 


vn 


Ol .005 
о. шт 
32 34 
2314.337 
307 1330 
2301.323 
295 {317 
2% эп 
284 .305 
2% .300 
1275 295 
270 290 
266 .285 
262 .281 
258 277 
254. 2 
251,266 
.247 265 
244 .% 
21.2% 
238 255 
235.252 
1.52 1.63 
Уп y." 


Adapted Њу permission from Table 1 of Leslie H. Miller, Table of Percemage 
points of Kolmogorov statistics, J. Am. Stat. Assoc. 51 (1956), 111-121 
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Table 8. Critical Values of the evoca bead Test Statistic for 
Two Samples of Equal Size ^ +7 


This table gives the values of Dis and D,,,, for which a > P(D7 
Dy na} anda > P(D,,, Dania} for some selected values of n and. a 


One-Sided Test: ix 

а= .10 .05 —025 01 .005 ‘а= 10 .05 .025 .01 .005 
Two-Sided Test: 

а= 20. 10 .05 .02 101 а= .20 10 :05 .02 .01 


n=3 2/3 2/3 ў n=20 6/20 17/20. 8/20. 9/20: 10/20 
43/4344 3/4 5 2r 6/21 7/21 . 8/21 . 9/21 “10/21 
5 3/5 3/5 4|5 4/5. 4/5 22 7/22 8/22 8/22 10/22 10/22 
‚6 3/6 4/6 4/6 5/6 5/6 23 7/23 8/23 _ 9|23 10/23 10/23 
pol 41 47. SIT. 5/7. sm 24 7/24 8/24 9/24 10/24 11/24 
"48 4/8. 4/80 5/8 5/8 6/8 25 7/25 ^ 8/25 .9]25 10/25" 11/25 
49 4|9 5/9, 5/9. 6/9. 6/9 26 7/26 . 8/26 | .9/26. 10/26. 11/26 


10 4/10 5/10 6/10. 6/10, 7/10 27 7/27 .8/[27  9|[27. 11/27. 11/27 
.1 5/11 5/1 6/11. 7/11 .7/11 28 8/28 9/28 10/28 11/28. 12/28 
12 5/12 5/12 6/12. 7/12. 7/12 29 8/29 9/29 10/29 11/29. 12/29 
B 5[13 6/13 6/13. 7/13. 8/13 30 8/30 9/30 . 10/30 . 11/30, 12/30 


2214 IK 6/14 7/14. 7/14 8/14 31 8/31 9/31 ..10/31 .. 11/31; 12/31 
15 6/15 7/15. 8/15 8/15 32, 8/32 „19/32 ..10/32. 12/32. 12/32 
16 6/16 6/16 7/16. 8/16 9/16. * 34 8/34 10/34 , 11/34 . 12/34- 13/34 
SI 6/17 7117 7/7. 8/17 9/17 36 9/36 10/36 11/36 12/36 13/36 
18 6/18 7[18 8/18. 9/18 9/18 38 9/38 10/38 11/38 . 13/38. 14/38 
У 19 6/19 7/19 8/19 9/19 9/19 40 9/40 10/40 12/40 13/40 14/40 
tar Approximation — L52 — 1.73 192 215 2.30 
M. M 8i ©) for n» 40: vn Vn in vn Vn 
Adapted by permission from Tables 2 and 3 of Z. W. Birnbaum and R. A. Hall, 
Small sample distributions for multisample statistics of the Smirnov type, Ann. Math. 
Di 31 EAM 710-720. Р 
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Table 9. Critical Values of the Kolmogoroy-Smirnoy Test Statistic for Two 
Samples of Unequal Size 


This table anes the values on р}, xa: andi Duy ifor which a 2 
P(Di, > Dina} апд а!'> P(D,, > D, e] for some selected values of 
N, = smaller sample size, №, = larger sample size, and a. 


One-Sided Test: 


$ а = 10 D -05 р .025 .01 .005 
Two-Sided Test: X : 
& п a =t .20 .10 .05 ; .02 .01 
N-1 № = 9 17/18 
[10 9/10 
МАЕ 2 м= 3 516 
4 3/4 | 
5 4/5 4/5 
6 5/6 5/6 
7 5/7 6/7 
8 3/4 7/8 7/8 
9 7/9 8/9 8/9 
10 7/10 n 9/10 
N, = № = 4 3/4 3/4 } 
5 2/3 4/5 4/5 
6 2/3 2/3 5/6 
7 2/3 5/7 6/7 6/7 
8 5/8 3/4 3/4 7/8 
9 D 2/3 7/9 8/9 8/9 
10 3/5 7/10 415 9/10 9/10 
12 7/12 aa 34 5/6 11/12 
Lu Mes 35 3/4 415 FEN: 
би 6 712 2/3 3/4 576 5/6 
7 1з 5/7 3/4 6/7 em 
8 5]8 5/8 3/4 7/8 7/8 
9 5/9 2/3 3/4 19 .— 8/9 
10 70/20. 13/20, >. 1/10 415 415 
12 7/12 2/3 * 2/3 3/4 5/6 
16 9/16 5/8 11/16 3/4 13/16 
a 26 3/5 2/3 2/3 5/6 5/6 
se a у 23/35 5/7 29/35 6/7 
8 11/20 5/8 27/40 45 415 
9 5/9 3/5 31/45 119 415 
10 1/2 3/5 7/10 110 415 
15 8/15 3/5 2/3 1115 10115 
20 1/2 11/20 3/5 7/10 3/4 
x 217 230 4/7 29/42 sn 5/6 
ИЕ М |; 112 2/3 3/4 3/4 
9 1/2 5/9 2/3 13/18 719 


- Source. Ada 


pted by permission from 


between two sample cumulatives, Ann, 


è 


F. J. Massey, Di 


Math. Stat. 23 ( 


istribution table for the deviation 
1952), 435-441. 


— 


: 
| 


Table 10. Critical Values of the Wilcoxon Signed Ranks Test Statistic 


This table gives values of t, for which P(T* > 1j) < о for selected values 
of п and a. Critical values in the lower tail may be obtained "S symmetry 
from the equation £j. = n(n + 1)/2 — ta. 


п E 025 05 10 
3 6 6 6 6 
4 10 10 10 9 
5 15 15 4 12 
6 21 20 18 17 
7 27 25 24 2 
8 34 32 30 27 
9 41 39 36 34 
10 49 46 44 40 
и 58 55 52 48 
12 67 64 60 56 
13 78 73 69 64 
14 89 84 79 ИЙЛЕ ү) 
15 100 94 89 83 
16 112 106 100 93 
17 125 A 118 11 104 
18 138 130 123 115 
19 152 143 136 127 
20 16 157 149 14% 


Source. Adapted by permission from Table 1 of В. L. McCornack, Extended tables of 
the Wilcoxon matched pairs signed-rank statistics, J. Am. Stat, Assoc. 60 (1965), 864-871. 


Table 11. Critical Values of the Mann-Whitney-Wilcoxon Test Statistic | | 


This table gives the values of u, for which P(U > ua} < a for some selected g 
values of т, n, and а. Critical values in the lower tail шу be obtained by 
symmetry from the equation uj, = mn — u;. 


n 


m а 2 3 4 5 6 7 8 9 10 

2 .0 4 6 8 10 12 14 16 18 20 
.025 4 6 8 10 12 14 15 17 19 І 
.05 4 6 8 9 n 13 14 16 18 
.10 4 5 7 8 10 12 13 15 16 

$0201 9 12 15 18 20 20 25 28 
.025 9 12 14 16 19 21 24 26 
.05 8 11 13 15 18 20 2 25 
10 7 10 12 4 16 18 21 23 

4 01 16 19 22 26 29 32 36 
025 15 18 21 24 27 31 34 
05 14 17 20 23 26 29 32 
10 12 15 18 21 24 26 29 

5 Ol 23 27 31 35 39 43 
025 2 26 29 33 37 41 3 
05 20 24 28 31 35 38 
10 19 22 26 29 32 36 

6 .01 32 37 41 46 51 
025 30 35 39 43 48 
фен 28 33 37 41 45 
10770 26 30 34 38 42 j 

T0. 7.01 42 48 53 58 | 
025 40 45 50 55 
05 37 42 47 52 | 
10 35 39 44 48 

8 01 54 60 66 
025 50 56 62 
05 48 53 59 
10 44 49 55 

9 01 66 13 | 
025 63 69 | 
05 59 65 
10 55 61 

10 01 80 
025 76 
05 2 
10 67 


н Л Nec ee MeL et ho We ge ЕЕН uk a gl 
Source. Adapted by permission from Table 1 of L. R. Verdooren, Extended tables of 
critical values for Wilcoxon’s tést statistic, Biometrika 50 (1963), 177-186, with the kind 
permission of Professor E. S. Pearson, the author, and the Biometrika Trustees. 
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Table 12. Critical Points of Kendall’s Tau Test Statistic 


This table gives the values of 5, for which P{S > S,} < a, where 5 = (2)7; 
for some selected values of a and n: Values in the lower tail may be ob- 
tained by symmetry, Si-a = — Sa 


n 100 050 025 01 
3 3 3 3 3 
4 4 4 6 6 
5 6 6 8 8 
6 7 9 п и 
7 9 п 13 15 
8 10 14 16 18 
9 12 16 18 22 
10 15 19 21 25 


Source. Adapted by permission from Table 1, page 173, of M. G. Kendall, Rank Cor- 
relation Methods, 3rd Ed., Griffin, London, 1962. For values of n > 11, see W. J. Con- 
over, Practical Nonparametric Statistics, John Wiley, New York, 1971, page 390. 


Table 13. Critical Values of Spearman’s Rank Correlation Statistic 


This table gives the values of R, such that P{R > R,) x a for some se- 
lected values of n and а. Critical values in the lower tail may be obtained 


by symmetry, Ri-, = — Ra 


= 
© 
3/8 
© 
5 


3 1,000 1.000 1.000 1,000 
4 1.000 1.000 .800 .800 
5 900 .900 .800 706 
6 886 .829 TA 600 
7 857 150 .679 :536 
8 810 714 .619 .500 
9 .767 2667 .583 461 
10 ‚721 .636 .552 .442 


I0 o лаа a E 
Source. Adapted by permission from. Table 2, Pages 174-175, of M. G. Kendall, Rank 
Correlation Methods, 3rd Ed., Griffin, London, 1962. For values of n > 11, see W. J. 
Conover, Practical Nonparametric Statistics, John Wiley, New York, 1971, page 391. 


Glossary of Some Frequently 
Used Symbols and 


Abbreviations 


+3 4 


{+ 
"sg 

P(x) 

Tim, lim, lim 
аа, 

9,9, 

1, 

&х) 

u 


= э 


implies 

implies and is implied by 

converges to 

increasing, decreasing 

nonincreasing, nondecreasing 

gamma function 

limit superior, limit inferior, limit 
real line, n-dimensional Euclidean space 
Borel c-field on &, Borel c-field on a, 
indicator function of set 4 

= lifx > 0, and = 0ifx<0 

EX, expected value 

EX", n > 0 integral 

E|X|l*,a»0 

AX — ЕХ), к> 0 integral 

= дь, Variance 

first, second, third derivative of f 
distributed as 

asymptotically (or approximately) equal to 
convergence in law 

convergence in probability 
convergence almost surely 
convergence in rth mean 

random variable 

distribution function 

Probability density function 
Probability mass functiofi 
Probability generating function 
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n 


| 


CLT 
SPRT 
bl, p) 
b(n, p) 
NB(5 p) 
PQ) 
Ula, 5] 
Gla, B) 


"Bas б) 


xe) 
€(u, 0) 
Л, 07) 
(п) 
Ет, n) 
Za 

Хна 

Ima 


Fana 
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moment generating function 
degress of freedom 

best linear unbiased estimate 
maximum likelihood estimate 
minimum variance unbiased estimate 
uniformly most accurate 
uniformly minimum variance unbiased estimate 
uniformly most accurate unbiased 
most powerful 

uniformly most powerful 

infinitely often ` 

independent, identically distributed 
standard deviation 

monotone likelihood ratio 

mean square error 

weak law of large numbers 

strong law of large numbers 
central limit theorem 

sequential probability ratio test 
Bernoulli with parameter р 
binomial with parameters n, p 
negative binomial with parameters r, p 
Poisson with parameter А 

uniform on [a, 5] 

gamma with parameters а, B 

beta with parameters a, 8 
chi-square with d. f. n 

Cauchy with parameters и, б 
normal with mean y, variance 0? 
Student's г with n d. f. 
distribution with (m, n) d.f. 

100(1 — a)th percentile of ./(0, 1) 
100(1 — a)th percentile of x(n) 
1001 — a)th percentile of (л) 
100(1 — a)th percentile of Ат, л) 
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properties of, 168, 169, 171, 172 
Confidence, bounds, 467-468 
coefficient, 468 
estimation problem, 467 
Confidence interval, 468 
Bayesian, 495 
expected length of, 491 
fixed-width, 469-470 
general method of construction; 471 
level of, 468 
length of, 469 
for the parameter of, Bernoulli, 475 
discrete uniform, 479 
exponential, 478 
normal, 469, 470, 478, 478 
uniform, 476 ©: 
shortest-length, 479 
from tests of hypotheses, 486 
UMAU, 491 
UMAU family of, for difference of two 
normal means, 490 
for normal mean, 489, 491 
for normal variance, 490 
for ratio of two normal vatiance, 490 
unbiased, 490 
using Chebychev's inequality, 474 
using CLT, 478 ^^ > 
using properties of MLE's, 474 
Confidence set, 468 
for mean and variance of normal, 471 
UMA family of, 469 
UMAU family of, 488 
unbiased, 488 
Consistent estimate, 335 
for parameter of, 5:090 335 
normal, 336 
uniform, 348 
Contaminated normal, 581-582, 584 
Contingency table, 565-566 
Continuity correction, 299 
Continuity theorem, 277, 294 
Continuous type distributions, 202-227 


. Convergence, a.s., 249, 263, 265, 270 


equivalence, 266 
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in distribution = weak, 240-241, 246, 253, 
277, 280 
of mgf's, 277 
modes of, 240, 257 
of moments, 242, 247, 248, 261 
of pdf's, 243 
of pmf's, 242 
in probability, 243, 247, 250, 252, 253, 
257, 335, 375 
in rth mean, 247 
Convex function, 4, 104 
Convolution of df's, 142, 284 
Correlation coefficient, 174, 175 
Countable additivity, 24 
Covariance, 155 
between sample mean and sample.vari- 
ance, 179, 323 
sample, 302 
Critical region, 406 


Decision function, 388, 592 
Degenerate rv, 62, 89, 181 
Degrees of freedom when pooling classes, 
450 
DeMorgan’s laws, 1 
Density function, probability, 63 
Discrete distributions, 181-202 
Dispersion matrix = variance - covariance 
matrix, 233, 236 
Distribution, conditional, 112 
of a function of an rv, 68 
a posteriori, 390 
a priori, 390 
of sample mean, 306 
of sample median, 308 
of sample quantile, 151, 308 
of sample range, 142, 308, 
Distribution function, 56-58, 106, 107-108 
continuity points of a, 64, 280 
of a continuous type rv, 63, 109 
convolution, 142, 284 
decomposition of a, 66 
discontinuity points of a, 56 
of a discrete type rv, 66, 108 
of a function of an rv, 68, 6; 70-71,73- 
75,183, 134-137 
of an rv, 56-58, 106-108 
Domain of attraction, 
280 
of normal, 291 
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Efficiency of an estimate, 369 
realtive, 369 
Empirical df = sample df, 299, 310 
Equivalence lemma, 266 
Equivalent rvs, 124, 266 
Estimable function, 350, 532 
Event, certain, 26 
elementary = simple, 21 
disjoint = mutually exclusive, 24, 46 
independent, 46, 47-48 
null, 21 
Expectation, conditional, 168, 170, 172, 
414 
properties, 168, 169, 171, 172 
Expected value = mean = mathematical ex- 
pectation, 78-79, 80, 84 
of a function of rv, 80, 154 
of product of rv's, 157-158 
of sum of rv's, 154 
Exponential distribution, 137, 207 
characterizations, 209, 212 
memory of, 209. + 
mgf, 207 
moments, 207 
Exponential family, 236, 347 
k-parameter, 238 
one-parameter, 236, 370, 419, 422, 428 


Fisher's Z-statistic, 320 
Fitting of distribution, binomial, 452 
normal, 449 
Poisson, 449 
Fréchet, Cramér, and Rao inequality, 361- 
363, 382, 386 
Fréchet, Cramér, and Rao lower bound for, 
binomial, 368 
exponential, 372 
geometric, 372 
normal, 372 
Poisson, 363. 
F-distribution, central, 317, 822, 504 
moments of, 318 
noncentral, 319 
moments of, 320 
F-test, 430 
of general linear hypothesis, 502 
as likelihood ratio test, 438 
for testing equality of variances, 457 
Fundamental identity of sequential analysis, 
633-634 
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applications, 634-635 


Gambler's ruin problem, 594-595, 615 
Gamma distribution, 206 
bivariate, 118 
characterizations, 208-209, 225 
mgf, 207 . 1 
moments, 207 
General linear hypothesis, 498 
canonical form, 503 
estimation in, 499 
likelihood ratio test of, 502 - 
Generating functions, 93 
moment, 95, 161 
probability, 93 
Geometric distribution, 94 
characterizations, 189-192, 201 
memory of, 190 
mgf, 188 
moments, 188 
Gliyenko-Cantelli theorem, 300 


Helmert orthogonal matrix, 139, 323 
Holder's inequality, 165 
Hypergeometric distribution, 192 
bivariate, 119 
mean and variance, 192 


Identically distributed гуз, 123, 126 
Implication rule, 28 
Improper integral, 12 
Independence and correlation, 175,179, 
229, 233 
Independence of events, 46.: 
complete = mutual, 47 
pairwise, 47-48 
Independence of гуз; 120, 121,122, 123, 
124, 125, 162-163 
complete = mutual, 122-123 
Pairwise, 122 
Independent, identically аар TV's, 
123, 142 А 
Indicator function, 54 
Infinite, integrals, 12 
Product, 8-9 
Infinitely often, 264 
Infinitesimal rv's, 288 
Interchange of limit operations, 11, 12 
Invariance, of hypothesis testing problem, 
481 
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principle, 338 
Invariant, class of distributions, 337, 430 
estimates, 337-338 
' maximal, 431 
tests, 431, 432 


Joint, df, 118 
distribution, 120 
pdf, 109, 110 
pmf, 108 

Jump, 9 
size of, 9, 57 

Jump point, of a df, 57 
of a function, 9 


Kendall's sample tau, 568 
distribution of, 569-570 
Kendall's tau coefficient, 567 
Kendall's tau test, 571 
Kolmogorov's, inequality, 268 
strong law of large numbers, 274 
Kolmogorov-Smirnov one sample statistic, 
589 ‘ 
for confidence bounds on df; 544 
distribution, 540, 541 
for estimating population df, 544 
Kolmogorov-Smirnov test, comparison with 
chi-square test, 543 
:опе-ѕатріе, 542 
two-sample, 558 
Kolmogorov-Smirnóv two sample statistic, 
557 
distribution, 557 ^ 
Kronecker lemma, 269 
Kurtosis, 93 


Laplace distribution, 99 
Laplace transform, 13° 
bilateral, 13 
convergence differentiation and unique- 
ness of, 13-14 
unilateral, 13 
Least square estimation, 173;174;499 
principle, 172, 499 
L'Hospital rule, 4 
Likelihood, equal, 18 
equation, 377, 382 
function, 376 
Likelihood ratio test, 435-436 
asymptotic distribution, 442 
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for general linear hypothesis, 502 
for parameter of, binomial, 436, 444 
for simple vs. simple hypotheses, 436 
bivariate normal, 444 
discrete uniform, 443 
exponential, 444 
normal, 437, 438, 443 
Limit inferior, 2 
set, 2 
superior, 2, 264 
Lindeberg central limit theorem, 282, 291 
Lindeberg condition, 282, 288, 290 
Linear model, 498 
Linear regression model, 500, 506 
confidence intervals, 509, 510, 511 
estimation, 506-507 
testing of hypotheses, 507, 508, 509 
Lognormal distribution, 226, 375 
Loss function, 388 
Lower bound for variance, Chapman, Rob- 
bins and Kiefer inequality, 365 | 
Fréchet, Cramer and Rao inequality, 361 
Lyapunov condition, 290 
Lyapunov inequality, 103, 167 


Maclaurin expansion, 7 
of an mgf, 96, 99 
Mann-Whitney statistic, 537, 561 
moments, 537-538, 563-564 
null distribution, 561-562 
Mann-Whitney-Wilcoxon test, 561 
Marginal, df, 111-112, 118, 120 
pdf, 111, 115 
pmf, 110 
Markovian dependence, 115 4 
Markov's inequality, 100, 104 
Maximal invariant statistic, 431 
function of, 432 
Maximum likelihood estimate, 376 
asymptotic normality, 384-385 
consistency, 384-385 
as a function of sufficient statistic, 381 
invariance Property, 383 
Maximum likelihood estimation, in case of 
missing observations, 387 р 
Principle, 376 Р 
Maximum likelihood estimation method 
"iss to, Bernoulli, 379-380, 383, 
8 
? binomial, 386, 387 
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bivariate normal, 387 
Cauchy, 387 
discrete uniform, 378, 386 
exponential, 383, 387 
gamma, 381 
geometric, 386 
hypergeometric, 378 
normal, 377, 387 
Poisson, 386. A ’ 
uniform, 379, 382 
Median, 91, 92 
Median test, 559 
Memory, of exponential, 209 
of geometric, 190 
Method of moments, 375 
applied to, beta, 375 
binomial, 374 
gamma, 375 
lognormal, 375 
normal, 375 
Poisson, 374 
uniform, 374 
Minimal sufficient statistic, 402 
for Bernoulli, 403 
for beta, 403 
for gamma, 403 
for geometric, 403 
for normal, 402, 403 
for Poisson, 403 
for uniform, 403 
Minimax, estimate, 389, 394 
principle, 389 
Minimax estimation for parameter of, 
Bernoulli, 398 


+ binomial, 394, 398 


hypergeometric, 396 

uniform, 398 5 Г 
Minimum mean square error estimate, 352 

for variance of normal, 358 
Minkowski inequality, 165 _ $ 
Missing observations, estimation, 360, 387 
Mode, 376 


Moment, about origin, 81, 89 


absolute, 81, 86, 103 

central, 89, 155 

condition, 84 

of conditional distribution, 168 

of df, 78 

of functions of ита vectors, 155 
inequalities, 100-103 
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of intercept of sample line of regression, 
332 
lemma, 86 
negative-order, 93 
non-existence of order-, 87 
of sample covariance, 305 
of sample mean, 303 
of sample regression coefficient, 331 
of sample variance, 304-305 
Moment generating function, 95, 161 
continuity theorem for, 277, 294 
' differentiation, 96, 99, 162 
expansion, 96, 99 
limiting, 276 
of sample mean, 306 
of sum of independent rv's, 145 
uniqueness, 96, 99 
Monotome likelihood ratio, 419 
for hypergeometric, 421 
for one-parameter exponentia! family, 
419 
UMP test for families with, 420 
for uniform, 419 
Monotonic function, 9 
Most efficient estimate, 369 
asymptotically, 370 
as MLE, 383 
and UMVUE, 370 
Most powerful test, 407 
for families with MLR, 420 
as a function of sufficient statistic, 414 
invariant, 432 
Neyman-Pearsoni, 412 
similar, 426 
unbiased, 425 
uniformly, 408 
Multidimensional rv = random vector, 105- 
106 
Multinomial coefficient, 39 
Multinomial distribution, 197-198 
characterization, 199 
тв, 198 > 
moments, 199 


Negative binomial (= Pascal or waiting time) ` 


distribution, 186-187 
bivariate, 118, 133 
сеп@а! term, 200 
me$h and variance, 187 
mef, 187 
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Neyman-Pearson lemma, 412, 463 
Neyman-Pearson lemma applied to, 
Bernoulli, 415 
normal, 415, 417 
Nonparametric = distribution-free estima- 
tion, 299, 531 
methods, 296-297, 530-588 
Nonparametric unbiased estimation, 531 
of distance between two df's, 535 Г 
of population mean, 533 
of population variance, 533 
of tail probability, 535 
Norm, 283 
Normal approximation, to binomial, 292 
to Poisson, 293 
Normal distribution = Gaussian law, 219 
bivariate, 139, 172, 227 
characterizations, 222, 223, 224 
contaminated, 581-582, 584 
as limit of binomial, 281 
as limit of chi-square, 281 
as limit of Poisson, 279 
mgf, 220 i 
moments, 220-221 
multivariate, 231 
singular, 228 
as stable distribution, 280 
standard, 116, 219 
tail probability, 222 
truncated, 116 
Normal equations, 173, 499 
Normed measure, 19 
Nuisance parameter, 489 


Odds, 25 
Operator, linear, 284 
Ordered samples, 36 Y 
Orders of magnitude, o and O notation, 6 
Order statistic, 149, 299 

is complete and sufficient, 532 

joint pdf, 150 

joint marginal pdf, 151-152 

marginal pdf, 151 

uses, 576, 579 


Parameter, of a distribution, 78 
estimable, 350, 532 
location, 349 
order, 78, 90-91 
scale, 349 


ies... 
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space, 334 
Parametric statistical hypothesis, 405 
alternative, 405 
composite, 405 
null, 405, 410 


problem of testing, 388, 405-406, 407, 


408 
simple, 405 
Parametric statistical inference, 296 
Pareto distribution, 92 
Partition, 399 
coarser, 400 
finer, 400 
minimal sufficient, 400 
reduction of a, 400 
sets, 899 
sub-, 400 
sufficient, 399 
Pivot, 480 
Point estimate, 833, 894 
Point estimation, problem of, 334, 885 
Poisson df, as incomplete gamma, 212 
Poisson distribution, 69, 130, 194 
central term, 200 
characterizations, 195, 196-197, 201 
coefficient of skewness, 93 
kurtosis, 93 ^ 
as limit of binomial, 200, 278 
as limit of negative binomial, 201 
mkan and variance, 194-195 
теѓ, 195 — 
moments, 93 
paf, 94, 195 
Polya distribution, 193 
Population, 296 
Population distribution, 297 
Power series, 7 
Principle of least squares, 172, 499 
Probability, 18, 24 
addition rule, 26 
axioms, 24 
conditional, 41, 112 
continuity of, 30, 35 " 
countable additivity of, 24 
density function, 63 
distribution, 56 
equally likely, 18 
equally likely assignment, 24, 36 
on finite sample spaces, 35 
generating function, 95, 99, 148 


geometric, 19 

integral transformation, 203 
mass function, 61 
multiplication rule, 42 
posterior and prior, 44 
principle of inclusion-exclusion, 27 
space, 25 

subadditivity, 26 

tail, 83 

total, 42 

uniform assignment of, 24, 36 


Quadratic form, 5-6, 108 
Quantile of order p = (100p)th percentile, 
90-91 


Random, 18 
Random experiment = statistical experiment, 
20 
Random interval, 467 
coverage of, 577-578 
Random sample, 297 
from a finite population, 30, 37 
from a probability distribution, 30, 297 
Random sampling, 297 
in geometric probability, $0 
Random set, family of, 467 
Random variable, 53, 106 
continuous type, 6$ 
discrete type, 61 
equivalent, 124, 266 
functions of a, 67-75 
interchangeable; 126 
standardized, 90 
symmetric, 142 
symmetrized, 145 
truncated, 116-117, 260 
Random vector, 105-106 
continuous type, 109 . 
discrete type, 108 
functions of, 127-147 
Random walk, 615 
and sequential probability ratio test, 616 


lines, 178, 175, 178 
174 


Regularity conditions of FCR inequality, $62 
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Riesz inequality, 104 

Risk function, 388 

Robustness, of chi-square test, 586-587 
of sample mean as an estimate, 581-583 
of sample standard deviation as an esti- 

mate, 583-585 

of Student's t-test, 585-586, 588 

Robust procedure, defined, 581 


Sample, 296-297 
correlation coefficient, 302, 310 
covariance, 302 
df, 299, 310 
line of regression, 302 
mean, 298, 301 
median, 302 
mgf, 301 
moments, 301, 302 
Point, 21 
quantile of order p, 302 
space, 20-21, 297° 
Statistic, 298 
variance, 298, 453 
Sampling with and without replacement, 36- 
37 
Sampling from bivariate normal, 325, 454 


distribution of sample correlation coeffi- ^ 


cient, 327 
distribution of sample regression coeffi- 
cient, 327 
independence of sample mean vector and 
dispersion matrix, 326 
Sampling from unvariate normal, 321, 444, 
452-458, 457 
distribution of sample variance, 322 
distribution of (X,S?), 321 
independence of X and S?, 139, 321 
Sequential estimation, 593-595, 596, 602 
fixed-with interval, 606, 610 
lower bound for variance, 599-600 
of normal mean, 593, 602 
of parameter of geometric, 593 
unbiased, 596-601, 602 
Sequentia] probability ratio test, 612 
approximating the power function, 624 
approximation of average sample size, 627 
approximation of stopping bounds, 617 
bounds for stopping bounds, 616 
optimal property, 631 
~ properties, 620 
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saving in number of observations, 629-631 
strength, 613 
Sequential statistical procedure, 592 
closed, 593^ 
Set function, 3 
Shortest-length confidence interval, 479 
for the mean of normal, 480-481 
for the parameter of beta, 485 
for the parameter of exponential, 485 
for the parameter of uniform, 484 
for the variance of normal, 482 
o-field, 2 
choice of, 21 
generated by a class = smallest, 2 
Sign test, 545 
Similar tests, 425, 426 
Single-sample problem, 538-553 
of fit, 539 
of location, 545 
and symmetry, 545 
Skewness, coefficient of, 93 
Slow variation, 87, 291 
Spearman's rank correlation coefficient, 571 
distribution, 572-573 
Stable distribution, 218-219, 280 
Standard deviation, 89 
Standardized rv, 90 
Statistic of order k, 149 
marginal pdf, 151 
Stein’s two-stage procedure, 606 
Stirling’s approximation, 8, 201 
Stochastically larger, 554 
Stopping region, rule and rv, 592 
Strong law of large numbers, 263, 265, 273, 
274 1 
Borel’s, 273 
Kolmogorov's, 274. 
Student's t-distribution, central, 315, 322, 
323, 331, 511 
bivariate, 332 
moments, 316 
noncentral, 317 
moments, 317 
Student's t-test, 428-429, 452-455 
as likelihood ratio test, 437. " 
robustness of, 585-586, 588 
Sufficient statistic, 339, 355, 371, 400, 531 
factorization criterion, 341 
joint, 342 
Sufficient statistic for, Bernoulli, 340, 349 
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beta, 348 
discrete uniform, 343 
gamma, 348 
lognormal, 349 
normal, 339, 343, 348, 349 
Poisson, 340 
uniform, 344, 349 
Support, of a function, 4 
line of, 4, 104 
Symmetric df or rv, 80, 142-143, 144,145 
Symmetrization, 145 
Symmetrized rv, 145 
Symmetry, center of, 80 


Tail-equivalence, 266 
Taylor’s expansion, 6 
Cauchy and Lagrange form of жык 
6-7 
Test, 406 
critical region, 406 
of hypothesis, 406, 444, 452 
level of significance, 407, 410 
most powerful, 407 
nonrandomized, 407 
power function, 407 
randomized, 407 
size, 407 
Statistic, 408 
Testing the hypothesis of, equality of several 
normal means, 514 
goodness of fit, 448, 542 
homogeneity, 554, 558, 559, 561 
independence, 511, 565, 


569, 5725 
Testing hypothesis for parame E AE. 
Bernoulli, 424, 614, 6 б ae 


binomial, 436, 443 & 
bivariate normal, 444, 454 ^ с 
discrete uniform, 424 s. 
exponential, 424, 444, 4 ,,620 
gamma, 424 
geometric, 434, 443, 620 Y $, 
hypergeometric, 421 94 
multinomial, 447 
normal, 422, 423, 428, 429, 430, 432, 

437, 438, 443, 461, 463, 614, 620 

Poisson, 424, 434, 620 
several binomial, 445 
several normal, 464 
two normal, 45$, 457 

Tests of hypothesis, Bayes, 460 


attusle bf 


683 


likelihood ratio, 435-436 

minimax, 462 

Neyman-Pearson, 412 
Tests of hypothesis listed, chi-square tests, 

428, 429, 444, 445, 447, 448. 

F-tests, 430, 457 

t-tests, 428, 429, 452-455 
Tolerance interval, 575 
Transformation, 133, 135, 150. 

of continuous type, 134-136 

of discrete type, 131, 133 

Helmert, 139, 323 

Jacobian of, 135, 136, 150 

not one-to-one, 133, 136, 150 

one-to-one, 131, 134-136 
Transition probability, 115 
Triangular distribution, 65 
Trinomial distribution, 199 
Truncated distribution, 116 
Truncation, 116, 260 
Two-point distribution, 182 
Two-sample problems, 553-579 
Types of error in testing hypotheses, 406 


Unbiased confidence interval, 488, 490 
general method of construction, 492 
for mean of normal, 491 
for parameter of exponential, 494 
for parameter of uniform, 494 
for variance of normal, 492 

Unbiased estimate, 350, 532 

best linear, 352 

and complete sufficient statistic, 356 


ficient statistic, 355 

52, 370, 534 

estimation for parameter of, 
oulli, 351, 357, 358 

1, 363 

te normal, 359, 372 
discrete uniform, 357, 359 


Q2 - exponential, 372 
gamma, 355, 359 


hypergeometric, 359 

negative binomial, 358 

normal, 350, 356-357, 358, 372 
Poisson, 351, 354, 358, 359, 565 


uniform, 363 


Unbiased test, 425 


for mean of normal, 426 
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and similar test, 426 

UMP, 425 » 
Uncorrlatedrvs, 174 > 1. 
Uniform distribution, 70, 122, 129, 

202 

characterization, 205 

discrete, 81-82, 183 _ 

generating samples, 204 

mgf, 146, 208 

moments, 203 

statistic of order К, 216 
U-statistic, 533 

for estimating mean and variance, 533. 
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as UMVUE, 534 
Variance, 89 


properties of, 89 
of sum of rv's, 159-160, 161 


Wald's equation, 597 
Weak law of large numbers, 257, 263, 291 
for weighted sums, 267 
Wilcoxon signed-ranks test, 548, 551 
Wilcoxon statistic, 548 
distribution, 550-551 
moments, 549 
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